HDF5 2.0.0.258fa78
API Reference
|
HDF5 supports compression of data using a stackable pipeline of filters which can be implemented for reading and writing datasets, both at runtime and post‐process. These filters are supported as dynamically loadable plugins, and users can even implement custom filters of their own design.
This section describes the programming model for an application that uses a third-party HDF5 filter plugin to write or read data. For simplicity of presentation, it is assumed that the HDF5 filter plugin is available on the system in a default location. The HDF5 filter plugin is discussed in detail in the Programming Model for HDF5 Filter Plugins section.
a Dataset A third-party filter can be added to the HDF5 filter pipeline by using the H5Pset_filter function, as a user would do in the past. The identification number and the filter parameters should be available to the application. For example, if the application intends to apply the HDF5 bzip2 compression filter that was registered with The HDF Group and has an identification number 307 (Registered Filters) then the application would follow the steps as outlined below:
An application does not need to do anything special to read the data with a third-party filter applied. For example, if one wants to read data written in the previous example, the following regular steps should be taken:
The command-line utility h5dump, for example, will read and display the data as shown:
If the filter can not be loaded then h5dump will show the following:
Data goes through the HDF5 filter pipeline only when it is written to the file or read into application memory space from the file. For example, the I/O operation is triggered with a call to H5Fflush, or when a data item (HDF5 metadata or a raw data chunk) is evicted from the cache or brought into the cache. Please notice that H5Dread/H5Dwrite calls on the chunked datasets do not necessarily trigger I/O since the HDF5 Library uses a separate chunk cache.
A data item may remain in the cache until the HDF5 Library is closed. If the HDF5 plugin that has to be applied to the data item becomes unavailable before the file and all objects in the file are closed, an error will occur. The following example demonstrates the issue. Please notice the position of the H5Zunregister call:
Here is an error stack produced by the program:
To avoid the problem make sure to close all objects to which the filter is applied and flush them using the H5Fflush call before unregistering the filter.
This section describes how to create an HDF5 filter, an HDF5 filter plugin, and how to install the HDF5 plugin on the system.
The HDF5 filter function for the dynamically loaded filter feature should be written as a custom filter. This example shows how to define and register a simple filter that adds a checksum capability to the data stream.
The function that acts as the filter always returns zero (failure) if the md5()
function was not detected at configuration time (left as an exercise for the reader). Otherwise the function is broken down to an input and output half. The output half calculates a checksum, increases the size of the output buffer if necessary, and appends the checksum to the end of the buffer. The input half calculates the checksum on the first part of the buffer and compares it to the checksum already stored at the end of the buffer. If the two differ then zero (failure) is returned, otherwise the buffer size is reduced to exclude the checksum. /code size_t md5_filter(unsigned int flags, size_t cd_nelmts, const unsigned int cd_values[],
size_t nbytes, size_t *buf_size, void **buf) { #ifdef
HAVE_MD5 unsigned char cksum[16];
if (flags & H5Z_REVERSE) { // Input assert(nbytes >= 16); md5(nbytes-16, buf, cksum); // Compare if (memcmp(cksum, (char)(buf)+ nbytes- 16, 16)) { return 0; // fail } // Strip off checksum return nbytes - 16; } else { // Output md5(nbytes, *buf, cksum); // Increase buffer size if necessary if (nbytes + 16 > *buf_size) { *buf_size = nbytes + 16; *buf = realloc(*buf, *buf_size); } // Append checksum memcpy((char)(*buf)+nbytes, cksum, 16); return nbytes+16; } #else
return 0; // fail #endif
} /endcode
Once the filter function is defined it must be registered so the HDF5 library knows about it. Since we're testing this filter we choose one of the H5Z_filter_t numbers from the reserved range. We'll randomly choose 305.
/code #define
FILTER_MD5 305 herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter); /endcode
Now we can use the filter in a pipeline. We could have added the filter to the pipeline before defining or registering the filter as long as the filter was defined and registered by time we tried to use it (if the filter is marked as optional then we could have used it without defining it and the library would have automatically removed it from the pipeline for each chunk written before the filter was defined and registered).
/code hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE); hsize_t chunk_size[3] = {10,10,10}; H5Pset_chunk(dcpl, 3, chunk_size); H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL); hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl); /endcode
See the example of a more sophisticated HDF5 bzip2 filter function in the /ref subsec_filter_plugins_build section. The HDF5 bzip2 filter function is also available for download from Filter Plugin Repository.
The user has to remember a few things when writing an HDF5 filter function.
The signature of the HDF5 filter function and the accompanying filter structure (see the section below) are described in the HDF5 Reference Manual H5Z_filter_t.
If you are writing a filter that will be used by others, it would be a good idea to request a filter identification number and register it with The HDF Group. Please follow the procedure described at Registered Filters.
The HDF Group anticipates that developers of HDF5 filter plugins will not only register new filters, but will also provide links to the source code and/or binaries for the corresponding HDF5 filter plugins.
It is very important for the users of the filter that developers provide filter information in the “name” field of the filter structure, for example:
The HDF5 Library and command-line tools have access to the “name” field. An application can use the H5Pget_filter<*> functions to retrieve information about the filters.
Using the example of the structure above, the h5dump tool will print the string “HDF5 bzip2 filter found at …” pointing users to the applied filter (see the example in the Reading Data with an Applied Third-party Filter section) thus solving the problem of the filter’s origin.
The HDF5 filter plugin source should include:
H5PL_type_t H5PLget_plugin_type(void); |
const void* H5PLget_plugin_info(void); |
H5PL_type_t H5PLget_plugin_type(void) {return H5PL_TYPE_FILTER;} |
const void* H5PLget_plugin_info(void) {return H5Z_BZIP2;} |
Build the HDF5 filter plugin as a shared library. The following steps should be taken:
The default directory for an HDF5 filter plugin library is defined on UNIX-like systems as
and on Windows systems as
The default path can be overwritten by a user with the HDF5_PLUGIN_PATH environment variable. Several directories can be specified for the search path using “:” as a path separator for UNIX-like systems and “;” for Windows.
Readers are encouraged to try the example in the “Building an HDF5 bzip2 Plugin Example” section.
Dynamic loading of the HDF5 filter plugin (or filter library) is triggered only by two events: when an application calls the H5Pset_filter function to set the filter for the first time, or when the data to which the filter is applied is read for the first time.
The HDF Group provides an repository of the HDF5 filter plugin that can be checked out from BZIP2 Filter Plugin.
It contains the source code for the bzip2 plugin library and an example that uses the plugin. It requires the HDF5 Library with the dynamically loaded feature and the bzip2 library being available on the system. The plugin and the example can be built using configure or CMake commands. For instructions on how to build with CMake, see the README.txt file in the source code distribution. The bzip2 library that can be built with CMake is available from:
See the documentation at hdf5_plugins/docs folder. In particular: INSTALL_With_CMake USING_HDF5_AND_CMake