HDF5 2.0.0.2ad0391
API Reference
|
File image operations allow users to work with HDF5 files in memory in the same ways that users currently work with HDF5 files on disk. Disk I/O is not required when file images are opened, created, read from, or written to.
An HDF5 file image is an HDF5 file that is held in a buffer in main memory. Setting up a file image in memory involves using either a buffer in the file access property list or a buffer in the The Memory (aka Core) Driver file driver. The advantage of working with a file in memory is faster access to the data.
The challenge of working with files in memory buffers is maximizing performance and minimizing memory footprint while working within the constraints of the property list mechanism. This should be a non-issue for small file images, but may be a major issue for large images.
If invoked with the appropriate flags, the H5LTopen_file_image high level library call should deal with these challenges in most cases. However, some applications may require the programmer to address these issues directly.
Functions used in file image operations are listed below.
C Function | Purpose |
---|---|
H5Pset_file_image | Allows an application to specify an initial file image. For more information, see section FI211. |
H5Pget_file_image | Allows an application to retrieve a copy of the file image designated for a VFD to use as the initial contents of a file. For more information, see section FI212. |
H5Pset_file_image_callbacks | Allows an application to manage file image buffer allocation, copying, reallocation, and release. For more information, see section FI213. |
H5Pget_file_image_callbacks | Allows an application to obtain the current file image callbacks from a file access property list. For more information, see section FI214. |
H5Fget_file_image | Provides a simple way to retrieve a copy of the image of an existing, open file. For more information, see section FI216. |
H5LTopen_file_image | Provides a convenient way to open an initial file image with the Core VFD. For more information, see section FI221. |
Abbreviation | This abbreviation is short for |
---|---|
FAPL or fapl | File Access Property List. In code samples, fapl is used. |
VFD | Virtual File Driver |
VFL | Virtual File Layer |
Developers who use the file image operations described in this document should be proficient and experienced users of the HDF5 C Library APIs. More specifically, developers should have a working knowledge of property lists, callbacks, and virtual file drivers.
See the following for more information.
The Alternate File Storage Layouts and Low-level File Drivers section is in The HDF5 File chapter of the HDF5 User Guide.
The H5Pset_fapl_core function call can be used to modify the file access property list so that the Memory virtual file driver, H5FD_CORE, is used. The Memory file driver is also known as the Core file driver.
Links to the HDF5 Virtual File Layer and List of Functions documents can be found in the HDF5 Technical Notes.
The C API function calls described in this chapter fall into two categories: low-level routines that are part of the main HDF5 C Library and one high-level routine that is part of the “lite” API in the high-level wrapper library. The high-level routine uses the low-level routines and presents frequently requested functionality conveniently packaged for application developers’ use.
The purpose of this section is to describe the low-level C API routines that support file image operations. These routines allow an in-memory image of an HDF5 file to be opened without requiring file system I/O.
The basic approach to opening an in-memory image of an HDF5 file is to pass the image to the Core file driver, and then tell the Core file driver to open the file. We do this by using the H5Pget_file_image/H5Pset_file_image calls. These calls allow the user to specify an initial file image.
A potential problem with the H5Pget_file_image/H5Pset_file_image calls is the overhead of allocating and copying of large file image buffers. The callback routines enable application programs to avoid this problem. However, the use of these callbacks is complex and potentially hazardous: the particulars are discussed in the semantics and examples chapters below (see section File Image Callback Semantics and section Reading an In-memory HDF5 File Image respectively). Fortunately, use of the file image callbacks should seldom be necessary: the H5LTopen_file_image call should address most use cases.
The property list facility in HDF5 is employed in file image operations. This facility was designed for passing data, not consumable resources, into API calls. The peculiar ways in which the file image allocation callbacks may be used allows us to avoid extending the property list structure to handle consumable resources cleanly and to avoid constructing a new facility for the purpose.
The sub-sections below describe the low-level C APIs that are used with file image operations.
The H5Pset_file_image routine allows an application to provide an image for a file driver to use as the initial contents of the file. This call was designed initially for use with the Core VFD, but it can be used with any VFD that supports using an initial file image when opening a file. See the FI215 section for more information. Calling this routine makes a copy of the provided file image buffer. See the FI213 section for more information.
The signature of H5Pset_file_image is defined as follows:
The parameters of H5Pset_file_image are defined as follows:
Given the tight interaction between the file image callbacks and the file image, the file image callbacks in a property list cannot be changed while a file image is defined.
With properly constructed file image callbacks, it is possible to avoid actually copying the file image. The particulars of this are discussed in greater detail in the C API Call Semantics chapter and in the Examples chapter.
The H5Pget_file_image routine allows an application to retrieve a copy of the file image designated for a VFD to use as the initial contents of a file. This routine uses the file image callbacks (if defined) when allocating and loading the buffer to return to the application, or it uses malloc and memcpy if the callbacks are undefined. When malloc and memcpy are used, it will be the caller’s responsibility to discard the returned buffer via a call to free.
The signature of H5Pget_file_image is defined as follows:
The parameters of H5Pget_file_image are defined as follows:
The H5Pset_file_image_callbacks API call exists to allow an application to control the management of file image buffers through user defined callbacks. These callbacks will be used in the management of file image buffers in property lists and in select file drivers. These routines are invoked when a new file image buffer is allocated, when an existing file image buffer is copied or resized, or when a file image buffer is released from use. From the perspective of the HDF5 Library, the operations of the image_malloc, image_memcpy, image_realloc, and image_free callbacks must be identical to those of the corresponding C standard library calls (malloc, memcpy, realloc, and free). While the operations must be identical, the file image callbacks have more parameters. The callbacks and their parameters are described below. The return values of image_malloc and image_realloc are identical to the return values of malloc and realloc. However, the return values of image_memcpy and image_free are different than the return values of memcpy and free: the return values of image_memcpy and image_free can also indicate failure. See the C API Call Semantics section for more information.
The signature of H5Pset_file_image_callbacks is defined as follows:
The parameters of H5Pset_file_image_callbacks are defined as follows:
The fields of the H5FD_file_image_callbacks_t structure are defined as follows:
H5FD_FILE_IMAGE_OP_PROPERTY_LIST_SET | This value is passed to the image_malloc and image_memcpy callbacks when an image buffer is being copied while being set in a FAPL. |
H5FD_FILE_IMAGE_OP_PROPERTY_LIST_COPY | This value is passed to the image_malloc and image_memcpy callbacks when an image buffer is being copied when a FAPL is copied. |
H5FD_FILE_IMAGE_OP_PROPERTY_LIST_GET | This value is passed to the image_malloc and image_memcpy callbacks when an image buffer is being copied while being retrieved from a FAPL. |
H5FD_FILE_IMAGE_OP_PROPERTY_LIST_CLOSE | This value is passed to the image_free callback when an image buffer is being released during a FAPL close operation. |
H5FD_FILE_IMAGE_OP_FILE_OPEN | This value is passed to the image_malloc and image_memcpy callbacks when an image buffer is copied during a file open operation. While the image being opened will typically be copied from a FAPL, this need not always be the case. An example of an exception is when the Core file driver takes its initial image from a file. |
H5FD_FILE_IMAGE_OP_FILE_RESIZE | This value is passed to the image_realloc callback when a file driver needs to resize an image buffer. |
H5FD_FILE_IMAGE_OP_FILE_CLOSE | This value is passed to the image_free callback when an image buffer is being released during a file close operation. |
In closing our discussion of H5Pset_file_image_callbacks, we note the interaction between this call and the H5Pget_file_image/H5Pset_file_image calls above: since the malloc, memcpy, and free callbacks defined in the instance of H5FD_file_image_callbacks_t are used by H5Pget_file_image/H5Pset_file_image, H5Pset_file_image_callbacks will fail if a file image is already set in the target property list.
For more information on writing the file image to disk, set the backing_store parameter. See the H5Pset_fapl_core entry in the HDF5 Reference Manual.
The H5Pget_file_image_callbacks routine is designed to obtain the current file image callbacks from a file access property list.
The signature of H5Pget_file_image_callbacks() is defined as follows:
The parameters of H5Pget_file_image_callbacks are defined as follows:
Upon successful return, the fields of callbacks_ptr shall contain values as defined below:
Implementation of the H5Pget_file_image_callbacks/H5Pset_file_image_callbacks and H5Pget_file_image/H5Pset_file_image function calls requires a pair of virtual file driver feature flags. The flags are H5FD_FEAT_ALLOW_FILE_IMAGE and H5FD_FEAT_CAN_USE_FILE_IMAGE_CALLBACKS. Both of these are defined in H5FDpublic.h.
The first flag, H5FD_FEAT_ALLOW_FILE_IMAGE, allows a file driver to indicate whether or not it supports file images. A VFD that sets this flag when its ‘query’ callback is invoked indicates that the file image set in the FAPL will be used as the initial contents of a file. Support for setting an initial file image is designed primarily for use with the Core VFD. However, any VFD can indicate support for this feature by setting the flag and copying the image in an appropriate way for the VFD (possibly by writing the image to a file and then opening the file). However, such a VFD need not employ the file image after file open time. In such cases, the VFD will not make an in-memory copy of the file image and will not employ the file image callbacks.
File drivers that maintain a copy of the file in memory (only the Core file driver at present) can be constructed to use the initial image callbacks (if defined). Those that do must set the H5FD_FEAT_CAN_USE_FILE_IMAGE_CALLBACKS flag, the second flag, when their ‘query’ callbacks are invoked.
Thus file drivers that set the H5FD_FEAT_ALLOW_FILE_IMAGE flag but not the H5FD_FEAT_CAN_USE_FILE_IMAGE_CALLBACKS flag may read the supplied image from the property list (if present) and use it to initialize the contents of the file. However, they will not discard the image when done, nor will they make any use of any file image callbacks (if defined).
If an initial file image appears in a file allocation property list that is used in an H5Fopen() call, and if the underlying file driver does not set the H5FD_FEAT_ALLOW_FILE_IMAGE flag, then the open will fail.
If a driver sets both the H5FD_FEAT_ALLOW_FILE_IMAGE flag and the H5FD_FEAT_CAN_USE_FILE_IMAGE_CALLBACKS flag, then that driver will allocate a buffer of the required size, copy the contents of the initial image buffer from the file access property list, and then open the copy as if it had just loaded it from file. If the file image allocation callbacks are defined, the driver shall use them for all memory management tasks. Otherwise it will use the standard malloc, memcpy, realloc, and free C library calls for this purpose.
If the VFD sets the H5FD_FEAT_ALLOW_FILE_IMAGE flag, and an initial file image is defined by an application, the VFD should ensure that file creation operations (as opposed to file open operations) bypass use of the file image, and create a new, empty file.
Finally, it is logically possible that a file driver would set the H5FD_FEAT_CAN_USE_FILE_IMAGE_CALLBACKS flag, but not the H5FD_FEAT_ALLOW_FILE_IMAGE flag. While it is hard to think of a situation in which this would be desirable, setting the flags this way will not cause any problems: the two capabilities are logically distinct.
The purpose of the H5Fget_file_image routine is to provide a simple way to retrieve a copy of the image of an existing, open file. This routine can be used with files opened using the SEC2 (aka POSIX), STDIO, and Core (aka Memory) VFDs.
The signature of H5Fget_file_image is defined as follows:
The parameters of H5Fget_file_image are defined as follows:
The current file size can be obtained via a call to H5Fget_filesize. Note that this function returns the value of the end of file (EOF) and not the end of address space (EOA). While these values are frequently the same, it is possible for the EOF to be larger than the EOA. Since H5Fget_file_image will only obtain a copy of the file from the beginning of the superblock to the EOA, it will be best to use H5Fget_file_image to determine the size of the buffer required to contain the image.
Here are some other notes regarding the design and implementation of H5Fget_file_image.
The H5Fget_file_image call should be part of the high-level library. However, a file driver agnostic implementation of the routine requires access to data structures that are hidden within the HDF5 Library. We chose to implement the call in the library proper rather than expose those data structures.
There is no reason why the H5Fget_file_image API call could not work on files opened with any file driver. However, the Family, Multi, and Split file drivers have issues that make the call problematic. At present, files opened with the Family file driver are marked as being created with that file driver in the superblock, and the HDF5 Library refuses to open files so marked with any other file driver. This negates the purpose of the H5Fget_file_image call. While this mark can be removed from the image, the necessary code is not trivial.
Thus we will not support the Family file driver in H5Fget_file_image unless there is demand for it. Files created with the Multi and Split file drivers are also marked in the superblock. In addition, they typically use a very sparse address space. A sparse address space would require the use of an impractically large buffer for an image, and most of the buffer would be empty. So, we see no point in supporting the Multi and Split file drivers in H5Fget_file_image under any foreseeable circumstances.
The H5LTopen_file_image high-level routine encapsulates the capabilities of routines in the main HDF5 Library with conveniently accessible abstractions.
The H5LTopen_file_image routine is designed to provide an easier way to open an initial file image with the Core VFD. Flags to H5LTopen_file_image allow for various file image buffer ownership policies to be requested. See the HDF5 Reference Manual for more information on high-level APIs.
The signature of H5LTopen_file_image is defined as follows:
The parameters of H5LTopen_file_image are defined as follows:
H5LT_FILE_IMAGE_OPEN_RW | Indicates that the HDF5 Library should open the image read/write instead of the default read-only. |
H5LT_FILE_IMAGE_DONT_COPY | Indicates that the HDF5 Library should not copy the file image buffer provided, but should use it directly. The HDF5 Library will release the file image when finished. The supplied buffer must have been allocated via a call to the standard C library malloc() or calloc() routines. The HDF5 Library will call free() to release the buffer. In the absence of this flag, the HDF5 Library will copy the buffer provided. The H5LT_FILE_IMAGE_DONT_COPY flag provides an application with the ability to “give ownership” of a file image buffer to the HDF5 Library. The HDF5 Library will modify the buffer on write if the image is opened read/write and the H5LT_FILE_IMAGE_DONT_COPY flag is set. The H5LT_FILE_IMAGE_DONT_RELEASE flag, see below, is invalid unless the H5LT_FILE_IMAGE_DONT_COPY flag is set. |
H5LT_FILE_IMAGE_DONT_RELEASE | Indicates that the HDF5 Library should not attempt to release the buffer when the file is closed. This implies that the application will tend to this detail and that the application will not discard the buffer until after the file image is closed. Since there is no way to return a changed buffer base address to the application, and since realloc can change this value, calls to realloc() must be barred when this flag is set. As a result, any write that requires an increased buffer size will fail. This flag is invalid unless the H5LT_FILE_IMAGE_DONT_COPY flag, see above, is set. If the H5LT_FILE_IMAGE_DONT_COPY flag is set and this flag is not set, the HDF5 Library will release the file image buffer after the file is closed using the standard C library free() routine. Using this flag and the H5LT_FILE_IMAGE_DONT_COPY flag provides a way for the application to specify a buffer that the HDF5 Library can use for opening and accessing as a file image while letting the application retain ownership of the buffer. |
The following table is intended to summarize the semantics of the H5LT_FILE_IMAGE_DONT_COPY and H5LT_FILE_IMAGE_DONT_RELEASE flags (shown as “Don’t Copy Flag” and “Don’t Release Flag” respectively in the table):
Don’t Copy Flag | Don’t Release Flag | Make Copy of User Supplied Buffer | Pass User Supplied Buffer to File Driver | Release User Supplied Buffer When Done | Permit realloc of Buffer Used by File Driver |
---|---|---|---|---|---|
False | Don’t care | True | False | False | True |
True | False | False | True | True | True |
True | True | False | True | False | False |
The return value of H5LTopen_file_image will be a file ID on success or a negative value on failure. The file ID returned should be closed with H5Fclose.
Note that there is no way currently to specify a “backing store” file name in this definition of H5LTopen_file_image.
The purpose of this chapter is to describe some issues that developers should consider when using file image buffers, property lists, and callback APIs.
The H5Pget_file_image_callbacks/H5Pset_file_image_callbacks API calls allow an application to hook the memory management operations used when allocating, duplicating, and discarding file images in the property list, in the Core file driver, and potentially in any in-memory file driver developed in the future.
From the perspective of the HDF5 Library, the supplied image_malloc(), image_memcpy(), image_realloc(), and image_free() callback routines must function identically to the C standard library malloc(), memcpy(), realloc(), and free() calls. What happens on the application side can be much more nuanced, particularly with the ability to pass user data to the callbacks. However, whatever the application does with these calls, it must maintain the illusion that the calls have had the expected effect. Maintaining this illusion requires some understanding of how the property list structure works, and what HDF5 will do with the initial images passed to it.
At the beginning of this document, we talked about the need to work within the constraints of the property list mechanism. When we said “from the perspective of the HDF5 Library…” in the paragraph above, we are making reference to this point.
The property list mechanism was developed as a way to add parameters to functions without changing the parameter list and breaking existing code. However, it was designed to use only “call by value” semantics, not “call by reference”. The decision to use “call by value” semantics requires that the values of supplied variables be copied into the property list. This has the advantage of simplifying the copying and deletion of property lists. However, if the value to be copied is large (say a 2 GB file image), the overhead can be unacceptable.
The usual solution to this problem is to use “call by reference” where only a pointer to an object is placed in a parameter list rather than a copy of the object itself. However, use of “call by reference” semantics would greatly complicate the property list mechanism: at a minimum, it would be necessary to maintain reference counts to dynamically allocated objects so that the owner of the object would know when it was safe to free the object.
After much discussion, we decided that the file image operations calls were sufficiently specialized that it made no sense to rework the property list mechanism to support “call by reference.” Instead we provided the file image callback mechanism to allow the user to implement some version of “call by reference” when needed. It should be noted that we expect this mechanism to be used rarely if at all. For small file images, the copying overhead should be negligible, and for large images, most use cases should be addressed by the H5LTopen_file_image call.
In the (hopefully) rare event that use of the file image callbacks is necessary, the fundamental point to remember is that the callbacks must be constructed and used in such a way as to maintain the library’s illusion that it is using “call by value” semantics.
Thus the property list mechanism must think that it is allocating a new buffer and copying the supplied buffer into it when the file image property is set. Similarly, it must think that it is allocating a new buffer and copying the contents of the existing buffer into it when it copies a property list that contains a file image. Likewise, it must think it is de-allocating a buffer when it discards a property list that contains a file image.
Similar illusions must be maintained when a file image buffer is copied into the Core file driver (or any future driver that uses the file image callbacks) when the file driver re-sizes the buffer containing the image and finally when the driver discards the buffer.
The owner of a file image in a buffer is the party that has the responsibility to discard the file image buffer when it is no longer needed. In this context, the owner is either the HDF5 Library or the application program.
We implemented the image_* callback facility to allow efficient management of large file images. These facilities can be used to allow sharing of file image buffers between the application and the HDF5 library, and also transfer of ownership in either direction. In such operations, care must be taken to ensure that ownership is clear and that file image buffers are not discarded before all references to them are discarded by the non-owning party.
Ownership of a file image buffer will only be passed to the application program if the file image callbacks are designed to do this. In such cases, the application program must refrain from freeing the buffer until the library has deleted all references to it. This in turn will happen after all property lists (if any) that refer to the buffer have been discarded, and the file driver (if any) that used the buffer has closed the file and thinks it has discarded the buffer.
As mentioned above, the HDF5 property lists are a mechanism for passing values into HDF5 Library calls. They were created to allow calls to be extended with new parameters without changing the actual API or breaking existing code. They were designed based on the assumption that all new parameters would be “call by value” and not “call by reference.” Having “call by value” parameters means property lists can be copied, reused, and discarded with ease.
Suppose an application wished to share a file image buffer with the HDF5 Library. This means the library would be allowed to read the file image, but not free it. The file image callbacks might be constructed as follows to share a buffer:
As the property list code will never resize a buffer, we do not discuss the image_realloc() call here. The behavior of image_realloc() in this scenario depends on what the application wants to do with the file image after it has been opened. We discuss this issue in the next section. Note also that the operation passed into the file image callbacks allow the callbacks to behave differently depending on the context in which they are used.
For more information on user defined data, see the File Image Callback Semantics section.
When a file image is opened by a driver that sets both the H5FD_FEAT_ALLOW_FILE_IMAGE and the H5FD_FEAT_CAN_USE_FILE_IMAGE_CALLBACKS flags, the driver will allocate a buffer large enough for the initial file image and then copy the image from the property list into this buffer. As processing progresses, the driver will reallocate the image as necessary to increase its size and will eventually discard the image at file close. If defined, the driver will use the file image callbacks for these operations; otherwise, the driver will use the standard C library calls. See the File Image Callback Semantics section for more information.
As described above, the file image callbacks can be constructed so as to avoid the overhead of buffer allocations and copies while allowing the HDF5 Library to maintain its illusions on the subject. There are two possible complications involving the file driver. The complications are the possibility of reallocation calls from the driver and the possibility of the continued existence of property lists containing references to the buffer.
Suppose an application wishes to share a file image buffer with the HDF5 Library. The application allows the library to read (and possibly write) the image, but not free it. We must first decide whether the image is to be opened read-only or read/write.
If the image will be opened read-only (or if we know that any writes will not change the size of the image), the image_realloc() call should never be invoked. Thus the image_realloc() routine can be constructed so as to always fail, and the image_malloc(), image_memcpy(), and image_free() routines can be constructed as described in the section above.
Suppose, however, that the file image will be opened read/write and may grow during the computation. We must now allow for the base address of the buffer to change due to reallocation calls, and we must employ the user data structure to communicate any change in the buffer base address and size to the application. We pass buffer changes to the application so that the application will be able to eventually free the buffer. To this end, we might define a user data structure as shown in the example below:
Using a user data structure to communicate with an application
We initialize an instance of the structure so that init_ptr points to the buffer to be shared, init_size contains the initial size of the buffer, and all other fields are initialized to either NULL or 0 as indicated by their type. We then pass a pointer to the instance of the user data structure to the HDF5 Library along with allocation callback functions constructed as follows:
In either case, if both the init_ref_count and mod_ref_count fields have dropped to zero, notify the application that the HDF5 Library is done with the buffer. If the mod_ptr or mod_size fields have been modified, pass these values on to the application as well.
One can argue whether creating a file with an initial file image is closer to creating a file or opening a file. The consensus seems to be that it is closer to a file open, and thus we shall require that the initial image only be used for calls to H5Fopen.
Whatever our convention, from an internal perspective, opening a file with an initial file image is a bit of both creating a file and opening a file. Conceptually, we will create a file on disk, write the supplied image to the file, close the file, open the file as an HDF5 file, and then proceed as usual (of course, the Core VFD will not write to the file system unless it is configured to do so). This process is similar to a file create: we are creating a file that did not exist on disk to begin with and writing data to it. Also, we must verify that no file of the supplied name is open. However, this process is also similar to a file open: we must read the superblock and handle the usual file open tasks.
Implementing the above sequence of actions has a number of implications on the behavior of the H5Fopen call when an initial file image is supplied:
See the FI215 section for more information.
As we indicated earlier, if an initial file image appears in the property list of an H5Fcreate call, it is ignored.
While the above section on the semantics of the file image callbacks may seem rather gloomy, we get the payback here. The above says everything that needs to be said about initial file image semantics in general. The sub-section below has a few more observations on the Core file driver.
At present, the Core file driver uses the open() and read() system calls to load an HDF5 file image from the file system into RAM. Further, if the backing_store flag is set in the FAPL entry specifying the use of the Core file driver, the Core file driver’s internal image will be used to overwrite the source file on either flush or close. See the H5Pset_fapl_core entry in the HDF5 Reference Manual for more information.
This results in the following observations. In all cases assume that use of the Core file driver has been specified in the FAPL.
Thus a call to H5Fopen can result in the creation of a new HDF5 file in the file system.
The purpose of this chapter is to provide examples of how to read or build an in-memory HDF5 file image.
The H5Pset_file_image function call allows the Core file driver to be initialized from an application provided buffer. The following pseudo code illustrates its use:
Example 2. Using H5Pset_file_image to initialize the Core file driver
This solution is easy to code, but the supplied buffer is duplicated twice. The first time is in the call to H5Pset_file_image when the image is duplicated and the duplicate inserted into the property list. The second time is when the file is opened: the image is copied from the property list into the initial buffer allocated by the Core file driver. This is a non-issue for small images, but this could become a significant performance hit for large images.
If we want to avoid the extra malloc and memcpy calls, we must decide whether the application should retain ownership of the buffer or pass ownership to the HDF5 Library.
The following pseudo code illustrates opening the image read-only using the H5LTopen_file_image() routine. In this example, the application retains ownership of the buffer and avoids extra buffer allocations and memcpy calls.
Example 3. Using H5LTopen_file_image to open a read-only file image where the application retains ownership of the buffer
If the application wants to transfer ownership of the buffer to the HDF5 Library, and the standard C library routine free is an acceptable way of discarding it, the above example can be modified as follows:
Example 4. Using H5LTopen_file_image to open a read-only file image where the application transfers ownership of the buffer
Again, file access is read-only. Read/write access can be obtained via the H5LTopen_file_image call, but we will explore that in the section below.
Before the implementation of file image operations, HDF5 supported construction of an image of an HDF5 file in memory with the Core file driver. The H5Fget_file_image function call allows an application access to the file image without first writing it to disk. See the following code fragment:
Example 5. Accessing the image of a file in memory
The use of H5Fget_file_image may be acceptable for small images. For large images, the cost of the malloc() and memcpy() operations may be excessive. To address this issue, the H5Pset_file_image_callbacks call allows an application to manage dynamic memory allocation for file images and memory-based file drivers (only the Core file driver at present). The following code fragment illustrates its use. Note that most error checking is omitted for simplicity and that H5Pset_file_image is not used to set the initial file image.
Example 6. Using H5Pset_file_image_callbacks to improve memory allocation
The above code fragment gives the application full ownership of the buffer used by the Core file driver after the file is closed, and it notifies the application that the HDF5 Library is done with the buffer by setting udata.image_ptr to something other than NULL. If read access to the buffer is sufficient, the H5Fget_vfd_handle call can be used as an alternate solution to get access to the base address of the Core file driver’s buffer.
The above solution avoids some unnecessary malloc and memcpy calls and should be quite adequate if an image of an HDF5 file is constructed only occasionally. However, if an HDF5 file image must be constructed regularly, and if we can put a strong and tight upper bound on the size of the necessary buffer, then the following pseudo code demonstrates a method of avoiding memory allocation completely. The downside, however, is that buffer is allocated statically. Again, much error checking is omitted for clarity.
Example 7. Using H5Pset_file_image_callbacks with a static buffer
If we can further arrange matters so that only the contents of the datasets in the HDF5 file image change, but not the structure of the file itself, we can optimize still further by re-using the image and changing only the contents of the datasets after the initial write to the buffer. The following pseudo code shows how this might be done. Note that the code assumes that buf already contains the image of the HDF5 file whose dataset contents are to be overwritten. Again, much error checking is omitted for clarity. Also, observe that the file image callbacks do not support the H5Pget_file_image call.
Example 8. Using H5Pset_file_image_callbacks where only the datasets change
Before we go on, we should note that the above pseudo code can be written more compactly, albeit with fewer sanity checks, using the H5LTopen_file_image call. See the example below:
Example 9. Using H5LTopen_file_image where only the datasets change
While the scenario above is plausible, we will finish this section with a more general scenario. In the pseudo code below, we assume sufficient RAM to retain the HDF5 file image between uses, but we do not assume that the HDF5 file structure remains constant or that we can place a hard per bound on the image size.
Since we must use malloc, realloc, and free in this example, and since realloc can change the base address of a buffer, we must maintain two of ptr, size, and ref_count triples in the udata structure. The first triple is for the property list (which will never change the buffer), and the second triple is for the file driver. As shall be seen, this complicates the file image callbacks considerably. Note also that while we do not use H5Pget_file_image() in this example, we do include support for it in the file image callbacks. As usual, much error checking is omitted in favor of clarity.
Example 10. Using H5LTopen_file_image where only the datasets change and where the file structure and image size might not be constant
The above pseudo code shows how a buffer can be passed back and forth between the application and the HDF5 Library. The code also shows the application having control of the actual allocation, reallocation, and freeing of the buffer.
Using the file image operations described in this document, we can bundle up data in an image of an HDF5 file on one process, transmit the image to a second process, and then open and read the image on the second process without any mandatory file system I/O.
We have already demonstrated the construction and reading of such buffers above, but it may be useful to offer an example of the full operation. We do so in the example below using as simple a set of calls as possible. The set of calls in the example has extra buffer allocations. To reduce extra buffer allocations, see the sections above.
In the following example, we construct an HDF5 file image on process A and then transmit the image to process B where we then open the image and extract the desired data. Note that no file system I/O is performed: all the processing is done in memory with the Core file driver.
Example 11. Building and passing a file image from one process to another
*** Process A *** | *** Process B *** |
---|---|
<Open and construct the desired file with the Core file driver>
H5Fflush(fid);
size = H5Fget_file_image(fid, NULL, 0);
buffer_ptr = malloc(size);
H5Fget_file_image(fid, buffer_ptr, size);
<transmit size>
<transmit *buffer_ptr>
free(buffer_ptr);
<close core file>
| hid_t file_id;
<receive size>
buffer_ptr = malloc(size)
<receive image in *buffer_ptr>
file_id = H5LTopen_file_image(buf,
buf_len,
<read data from file, then close.
note that the Core file driver
will discard the buffer on close>
|
After the above examples, an example of the use of a template file might seem anti-climactic. A template file might be used to enforce consistency on file structure between files or in parallel HDF5 to avoid long sequences of collective operations to create the desired groups, datatypes, and possibly datasets. The following pseudo code outlines a potential use:
Example 12. Using a template file
Observe that the above pseudo code includes an unnecessary buffer allocation and copy in the call to H5Pset_file_image. As we have already discussed ways of avoiding this, we will not address that issue here.
What is interesting in this case is to consider why the application would find this use case attractive.
In the serial case, at first glance there seems little reason to use the initial image facility at all. It is easy enough to use standard C calls to duplicate a template file, rename it as desired, and then open it as an HDF5 file.
However, this assumes that the template file will always be available and in the expected place. This is a questionable assumption for an application that will be widely distributed. Thus, we can at least make an argument for either keeping an image of the template file in the executable or for including code for writing the desired standard definitions to new HDF5 files.
Assuming the image is relatively small, we can further make an argument for the image in place of the code, as, quite simply, the image should be easier to maintain and modify with an HDF5 file editor.
However, there remains the question of why one should pass the image to the HDF5 Library instead of writing it directly with standard C calls and then using HDF5 to open it. Other than convenience and a slight reduction in code size, we are hard pressed to offer a reason.
In contrast, the argument is stronger in the parallel case since group, datatype, and dataset creations are all expensive collective operations. The argument is also weaker: simply copying an existing template file and opening it should lose many of its disadvantages in the HPC context although we would imagine that it is always useful to reduce the number of files in a deployment.
In closing, we would like to consider one last point. In the parallel case, we would expect template files to be quite large. Parallel HDF5 requires eager space allocation for chunked datasets. For similar reasons, we would expect template files in this context to contain long sequences of zeros with a scattering of metadata here and there. Such files would compress well, and the compressed images would be cheap to distribute across the available processes if necessary. Once distributed, each process could uncompress the image and write to file those sections containing actual data that lay within the section of the file assigned to the process. This approach might be significantly faster than a simple copy as it would allow sparse writes, and thus it might provide a compelling use case for template files. However, this approach would require extending our current API to allow compressed images. We would also have to add the H5Pget_image_decompression_callback/H5Pset_image_decompression_callback API calls. We see no problem in doing this. However, it is beyond the scope of the current effort, and thus we will not pursue the matter further unless there is interest in our doing so.
Java function call signatures for the file image operation APIs have not yet been implemented, and there are no immediate plans for implementation.
Fortran function call signatures for the file image operation APIs are described in this section.
The Fortran low-level APIs make use of Fortran 2003’s ISO_C_BINDING module in order to achieve portable and standard conforming interoperability with the C APIs. The C pointer (C_PTR) and function pointer (C_FUN_PTR) types are returned from the intrinsic procedures C_LOC(X) and C_FUNLOC(X), respectively, defined in the ISO_C_BINDING module. The argument X is the data or function to which the C pointers point to and must have the TARGET attribute in the calling program. Note that the variable name lengths of the Fortran equivalent of the predefined C constants were shortened to less than 31 characters in order to be Fortran standard compliant.
h5pget_file_image_f |
---|
SUBROUTINE h5pget_file_image_f(fapl_id, buf_ptr, buf_len_ptr, hdferr)
IMPLICIT NONE
INTEGER(HID_T) , INTENT(IN) :: fapl_id
TYPE(C_PTR) , INTENT(IN), DIMENSION(*) :: buf_ptr
INTEGER(SIZE_T), INTENT(OUT) :: buf_len_ptr
INTEGER , INTENT(OUT) :: hdferr
|
h5pset_file_image_f |
---|
SUBROUTINE h5pset_file_image_f(fapl_id, buf_ptr, buf_len, hdferr)
IMPLICIT NONE
INTEGER(HID_T) , INTENT(IN) :: fapl_id
TYPE(C_PTR) , INTENT(IN) :: buf_ptr
INTEGER(SIZE_T), INTENT(IN) :: buf_len
INTEGER , INTENT(OUT) :: hdferr
|
h5fget_file_image_f |
---|
SUBROUTINE h5fget_file_image_f(file_id, buf_ptr, buf_len, hdferr, buf_size)
IMPLICIT NONE
INTEGER(HID_T) , INTENT(IN) :: file_id
TYPE(C_PTR) , INTENT(INOUT) :: buf_ptr
INTEGER(SIZE_T), INTENT(IN) :: buf_len
INTEGER , INTENT(OUT) :: hdferr
INTEGER(SIZE_T), INTENT(OUT) , OPTIONAL :: buf_size
|
Fortran function call signatures for the file image operation APIs have not yet been implemented yet.
Previous Chapter The HDF5 Virtual Object Layer (VOL) - Next Chapter The HDF5 Event Set Interface