HDF5 2.0.0.2ad0391
API Reference
|
What is this document about?
This document attempts to supplement the flow charts describing the flow of control for raw data I/O in the library. The following figures provide the main information:
This section provides notes to augment the information in the accompanying figures.
Validate Parameters - Resolve any H5S_ALL parameters for dataspace selections to actual dataspaces, allocate conversion buffers, etc.
Space Allocated in File? - Space may not have been allocated in the file to store the dataset data, if "late allocation" was chosen for the allocation time when the dataset was created.
Allocate & Fill Space - These operations allocate both contiguous and chunked dataset's space in the file. The chunked dataset space allocation iterates through all the chunks in the file and allocates both the B-tree information and the raw data in the file. Because of the way filters work, fill-values are written out for chunked datasets as they are allocated, instead of as a separate step. In parallel I/O, the chunked dataset allocation can potentially be time-consuming, since all the raw data in the dataset is allocated from one process.
Datatype Conversion Needed? - This currently is the deciding factor between doing "direct I/O" (in serial or parallel) and needing to perform gather/convert/scatter operations. I believe that MPI is capable of performing a limited range of type conversions and if so, we should add support to detect when they can be used. This will allow more I/O operations to be performed collectively.
Collective I/O Requested/Allowed? - A user has to both request that collective I/O occur and also their I/O operation must meet the requirements that the library sets for supporting collective parallel I/O:
Build "chunk map" - This step still has some scalability issues as it creates a data structure that is proportional to the number of chunks which will be written to, which could potentially be very large. Building the "chunk map" information incrementally is on the "to do" list also.
Perform Chunked I/O - As the figure shows, there is no support for collective parallel I/O on chunked datasets currently. As noted earlier, this is on the "to do" list.
Perform "Direct" Serial I/O - "Direct" serial I/O writes data from the application's buffer, without any intervening buffer or memory copies. For maximum efficiency and performance, the elements in the selections should be adjoining.