![]() |
HDF5 2.0.0.50d6458
API Reference
|
The HDF Group will drop support for HDF5 1.8.* releases in the summer of 2020. We strongly recommend that our users start migrating their applications as soon as possible and we ask that any problems that are encountered be reported to The HDF Group. Problems can be reported by sending email to help@.nosp@m.hdfg.nosp@m.roup..nosp@m.org, submitting a request to the The HDF Group’s Help Desk at HDF Help Desk, or posting to the HDF Forum.
Please follow these steps for moving your HDF5 application to HDF5 1.10:
If you have concerns about the files created by the rebuilt application (or software package), you may wish to consider several verification steps to assure that the files can be read by applications built with HDF5 1.8 releases.
–enable-error-stack
flag and report the issue. If you want to learn more, please see the frequently asked questions (FAQ) below.
Many new features (e.g., SWMR, VDS, paged allocation, etc.) that required extensions to the HDF5 file format were added to HDF5 1.10.0. For more information please see the Release Specific Information pages.
HDF5 1.8 will not be supported after the May 2020 release, i.e., there will be no more public releases of HDF5 1.8 with security patches, bug fixes, performance improvements and support for OSs and compilers.
In addition, applications originally written for use with HDF5 Release 1.8.x can be linked against the HDF5 Release 1.10.x library, thus taking advantage of performance improvements in 1.10. Users are encouraged to move to the latest releases of HDF5 1.10 or to HDF5 1.12.0 (coming out in the summer of 2019) if they want to stay with the current HDF5 upgrades.
If you are not planning to upgrade your system environment, and a version of HDF5 1.8.* works for you, then there is no reason to upgrade to the latest HDF5. However, if you regularly update your software to use the latest HDF5 1.8 maintenance release, then you need to plan a transition to HDF5 1.10 after the HDF5 1.8 May 2020 release.
The HDF5 1.10.* binaries are not ABI compatible with the HDF5 1.8.* binaries due to changes in the public header files and APIs. One has to rebuild an application with the HDF5 1.10.* library. The HDF Group tries hard to maintain ABI compatibility for minor maintenance releases, for example when moving from 1.8.21 to 1.8.22, or 1.10.5 to 1.10.6, but this is not the case when migrating from one major release to another, for example, from 1.8.21 to 1.10.5. If you want to learn more about HDF5 versioning please see HDF5 Library Release Version Numbers.
Yes, use the -DH5_USE_16_API compiler flag. For more information see the API Compatibility Macros.
If the application built on HDF5 Release 1.10 avoids use of the new features and does not request use of the latest format, applications built on HDF5 Release 1.8.x will be able to read files the first application created.
Unfortunately, no. However, we provide a few tools that will help you to “downgrade” the file, so it can be opened and used with tools and applications built with the 1.8 versions of HDF5.
If your application uses SWMR, then the h5format_convert tool can be used to “downgrade” the file to the HDF5 1.8 compatible file format without rewriting raw data.
The h5repack tool with –l flag can be used to repack VDS source datasets into the HDF5 file using contiguous, chunked or compact storage. The tool can also be used to rewrite the file using the HDF5 1.8 format by specifying the –high=H5F_LIBVER_V18
flag.
HDF5 1.10 introduces several new features in the HDF5 library. These new features were added in the first three releases of HDF5-1.10. For a brief description of each new feature see:
This release includes changes in the HDF5 storage format. For detailed information on the changes, see: Changes to the File Format Specification
/note PLEASE NOTE that HDF5-1.8 cannot read files created with the new features described below that are marked with *.
These changes come into play when one or more of the new features is used or when an application calls for use of the latest storage format (H5Pset_libver_bounds). See the Setting Bounds for Object Creation in HDF5 1.10.0 for more details.
Due to the requirements of some of the new features, the format of a 1.10.x HDF5 file is likely to be different from that of a 1.8.x HDF5 file. This means that tools and applications built to read 1.10.x files will be able to read a 1.8.x file, but tools built to read 1.8.x files may not be able to read a 1.10.x file.
If an application built on HDF5 Release 1.10 avoids use of the new features and does not request use of the latest format, applications built on HDF5 Release 1.8.x will be able to read files the first application created. In addition, applications originally written for use with HDF5 Release 1.8.x can be linked against a suitably configured HDF5 Release 1.10.x library, thus taking advantage of performance improvements in 1.10.
The following important new features and changes were introduced in HDF5-1.10.8. For complete details see the Release Notes and the Software Changes from Release to Release in HDF5 1.10 page.
The following important new features and changes were introduced in HDF5-1.10.7. For complete details see the Release Notes and the Software Changes from Release to Release in HDF5 1.10 page.
Addition of AEC (open source SZip) Compression Library HDF5 now supports building with the AEC library as a replacement library for SZip.
Addition of the Splitter and Mirror VFDs Two VFDs were added in this release:
Improvements to Performance Performance has continued to improve in this release. Please see the images under Compatibility and Performance Issues on the Software Changes from Release to Release in HDF5 1.10 page.
Addition of Hyperslab Selection Functions Several hyperslab selection routines introduced in HDF5-1.12 were ported to 1.10. See the Software Changes from Release to Release in HDF5 1.10 page for details.
The following important new features and changes were introduced in HDF5-1.10.6. For complete details see the Release Notes and the Software Changes from Release to Release in HDF5 1.10 page:
Several improvements were added to the CMake support, including:
Two Virtual File Drivers (VFDs) have been introduced in 1.10.6:
See the Virtual File Drivers - S3 and HDFS page for more information.
Performance was improved when creating a large number of small datasets.
The following important new features were added in HDF5-1.10.5. Please see the release announcement and Software Changes from Release to Release in HDF5 1.10 page for more details regarding these features:
The ability to minimize dataset object headers was added to reduce the file bloat caused by extra space in the dataset object header. The file bloat can occur when creating many, very small datasets. See the Release Notes and Dataset Object Header Size for more details regarding this issue.
The following APIs were introduced to support this feature:
A change was added to the default behavior in parallel when reading the same dataset in its entirety (i.e. H5S_ALL dataset selection) which is being read by all the processes collectively. The dataset must be contiguous, less than 2GB, and of an atomic datatype. The new behavior in the HDF5 library uses an MPI_Bcast to pass the data read from the disk by the root process to the remaining processes in the MPI communicator associated with the HDF5 file.
A CFD application was used to benchmark CGNS with:
These results were reported by Greg Sjaardema from Sandia National Laboratories.
(image missing)
Series 1 is the read-proc0-and-bcast solution Series 2 is a single MPI_Bcast Series 3 uses multiple MPI_Bcast totaling 2 MiB total data 64 bytes at a time (IIRC) Series 4 is unmodified CGNS develop Compact is using compact storage Compact 192 is also using compact storage Compact 384 is also using compact storage The last 3 "compact" curves are just three different batch jobs on 192, 384, and 552 nodes (with 36 core/node). The Series 2 and 3 curves are not related to the CGNS benchmark, but give a qualitative indication on the scaling behavior of MPI_Bcast. Both read-proc0-and-bcast and compact storage follow MPI_Bcast's trend, which makes sense since both methods rely on MPI_Bcast. See MS 3.2 – Addressing Scalability: Scalability of open, close, flush CASE STUDY: CGNS Hotspot analysis of CGNS cgp_open for better resolution.
Support for OpenMPI was added. For known problems and issues please see OpenMPI Build Issues. To better support OpenMPI, all MPI-1 API calls were replaced by MPI-2 equivalents.
New functions were added to find locations, sizes and filters applied to chunks of a dataset. This functionality is useful for applications that need to read chunks directly from the file, bypassing the HDF5 library.
See Chunk query functionality in HDF5 for more details.
This release was incorrectly developed and should not be used.
This release was incorrectly developed and should not be used.
Several important features and changes were added to HDF5 1.10.2. See the release announcement and blog for complete details. Following are the major new features:
In HDF5 1.8.0, the H5Pset_libver_bounds function was introduced for specifying the earliest ("low") and latest ("high") versions of the library to use when writing objects. With HDF5 1.10.2, new values for "low" and "high" were introduced: H5F_LIBVER_V18 and H5F_LIBVER_LATEST is now mapped to H5F_LIBVER_V110. See the H5Pset_libver_bounds function for details.
Optimizations were introduced to parallel HDF5 for improving the performance of open, close and flush operations at scale.
HDF5 parallel applications can now write data using compression (and other filters such as the Fletcher32 checksum filter).
HDF5 metadata is typically small, and scattered throughout the HDF5 file. This can affect performance, particularly on large HPC systems. The Metadata Cache Image feature can improve performance by writing the metadata cache in a single block on file close, and then populating the cache with the contents of this block on file open, thus avoiding the many small I/O operations that would otherwise be required on file open and close.
See Metadata Cache Image for more details.
The HDF5 library's metadata cache is fairly conservative about holding on to HDF5 object metadata (object headers, chunk index structures, etc.), which can cause the cache size to grow, resulting in memory pressure on an application or system. The "evict on close" property will cause all metadata for an object to be evicted from the cache as long as metadata is not referenced from any other open object.
The current HDF5 file space allocation accumulates small pieces of metadata and raw data in aggregator blocks which are not page aligned and vary widely in sizes. The paged aggregation feature was implemented to provide efficient paged access of these small pieces of metadata and raw data.
See HDF5 File Space Management: Paged Aggregation and HDF5 File Space Management for more details.
Small and random I/O accesses on parallel file systems result in poor performance for applications. Page buffering in conjunction with paged aggregation can improve performance by giving an application control of minimizing HDF5 I/O requests to a specific granularity and alignment.
See Page Buffering for more details.
Data acquisition and computer modeling systems often need to analyze and visualize data while it is being written. It is not unusual, for example, for an application to produce results in the middle of a run that suggest some basic parameters be changed, sensors be adjusted, or the run be scrapped entirely.
To enable users to check on such systems, we have been developing a concurrent read/write file access pattern we call SWMR (pronounced swimmer). SWMR is short for single-writer/multiple-reader. SWMR functionality allows a writer process to add data to a file while multiple reader processes read from the file.
The orderly operation of the metadata cache is crucial to SWMR functioning. A number of APIs have been developed to handle the requests from writer and reader processes and to give applications the control of the metadata cache they might need. However, the metadata cache APIs can be used when SWMR is not being used; so, these functions are described separately.
Calls for HDF5 metadata can result in many small reads and writes. On metadata reads, collective metadata I/O can improve performance by allowing the library to perform optimizations when reading the metadata, by having one rank read the data and broadcasting it to all other ranks.
Collective metadata I/O improves metadata write performance through the construction of an MPI derived datatype that is then written collectively in a single call.
Usage patterns when working with an HDF5 file sometimes result in wasted space within the file This can also impair access times when working with the resulting files. The new file space management feature provides strategies for managing space in a file to improve performance in both of these arenas.
With a growing amount of data in HDF5, the need has emerged to access data stored across multiple HDF5 files using standard HDF5 objects, such as groups and datasets, without rewriting or rearranging the data. The new virtual dataset (VDS) feature enables an application to draw on multiple datasets and files to create virtual datasets without moving or rewriting any data.
New options for the storage and filtering of partial edge chunks in a dataset provide a tool for tuning I/O speed and file size in cases where the dataset size may not be a multiple of the chunk size.
In addition to the features described above, several additional new functions, a new struct, and new macros have been introduced or newly versioned in this release.
The file format of the HDF5 library has been changed to support the new features in HDF5-1.10.
See the HDF5 File Format Specification Version 1.1 for complete details on the changes. This specification describes how the bytes in an HDF5 file are organized on the storage media where the file is kept. In other words, when a file is written to disk, the file will be written according to the information described in this file. The following sections have been added or changed:
HDF5-1.8 cannot read files created with the new features described on this page that are marked with *.
This page describes various new functions, a new struct, and new macros that are either unrelated to new features described elsewhere or have aspects that are unrelated to the feature where they are otherwise described. The page includes the following sections:
RFCs for the new features in HDF5-1.10: