TECHNOTES
Hierarchical Data Format
An emgering standard for scientific data exchange

By Joy A. Colucci

Working with scientific data can have its problems. First, there's the need to move it around. For instance, a scientist might generate a dataset by doing a simulation on a Cray supercomputer, and then want to transfer the data to a workstation for analysis and visualization, or to a Mac or PC for use with a publishing application. The need to do interdisciplinary research requires that data formats not be specific to just one type of science data. And it's also useful to append the raw data with text descriptors and other types of ancillary files. These were the very problems that computer scientists at the National Center for Supercomputing Applications (NCSA) set out to solve when they developed the Hierarchical Data Format (HDF).
    "The development of HDF was driven by the need to share scientific data in a heterogeneous environment," says Mike Folk, Project Manager in the Science Data Technologies Group at NCSA. "NCSA had a range of different types of computers - from PC to supercomputers, and everything in between - and we needed to move data between them," explains Folk. There are data conversion routines built into the HDF software, so that an HDF file created on one computer can be read on a different system without modification.
    HDF has steadily gained popularity since it was released in 1988, especially in the area of earth remote sensing. It was selected as the data standard for NASA's Mission to Planet Earth project, an $8 billion effort to monitor the Earth's environment that will generate a million-gigabyte data archive. Today, over thirty organizations have adopted HDF (see sidebar), and with the launch of NASA's AM-1 spacecraft later this year the volume of data available in HDF will skyrocket.

More Than Just Another Data Format
The power of HDF lies not only in its portability to different platforms, but also in its ability to encapsulate several different types of data in a single file. It was designed specifically to support all of the data structures used by scientists, such as images, text, numeric tables, and multi-dimensional numeric arrays. With HDF, you can mix and match any number of these related data objects together in one file and then access them as a group or as individual objects. Users can also create their own grouping structures using an HDF feature called vgroups. This means that anything from simple images, to multi-gigabyte remote sensing data, to whole research projects consisting of data-images and documentation-can be stored using HDF.
    HDF also comprises a library of subroutines for reading, writing, and organizing data and metadata. The HDF software is available at no charge from NCSA. The distribution consists of the HDF library, the HDF command line utilities, a test suite (source code only), a Java interface, and the Java-based HDF viewer (JHV).

HDF for Everyone
If a discussion of file formats, data objects, and subroutine libraries leaves you cold, take heart. An increasing number of HDF tools are becoming available for those of us who just want to read the files we receive without writing a program to do so. By far the most comprehensive package available for the manipulation of HDF files is Fortner Software's Noesys. Developed specifically for HDF, Noesys allows users to access, process, organize, and visualize large amounts of HDF data by using simple pull-down menus and icons. It's true that reading data from an HDF file is straightforward if you already know the name, definition, and other parameters of the data object you want. However, users are typically handed an HDF file without specific information about all the objects in the file or how the data is organized. Noesys leaves researchers free to concentrate on analyzing data without having to write custom data access programs.
    "I think Fortner is definitely on to something - I especially like the idea of using the Noesys navigational and annotation facilities to let me keep track of things when building up a complicated HDF file of my own," says Jim Frew, associate professor at the University of California Santa Barbara. The easy access and organization features of Noesys are what sets this software apart from other visualization packages that support HDF.
    The other commercial data visualization packages currently supporting HDF include Research System's IDL, Advanced Visual system's AVS, IBM's Data Explorer, MathWork's MATLAB, and Visual Numeric's PV-Wave. In the remote sensing community, PCI's EASI/PACE and ER Mapping's ER Mapper also provide basic access to HDF's raster data types. In addition to the commercial software packages, NCSA has a collection of HDF utilities and tools on its Web site. Fortner Software also provides a variety of free HDF tools that can be downloaded from their Web site, including a converter that translates data from netCDF to HDF, and a browser which lets users view the hierarchical directory structure of the objects in an HDF file.

About the Author:
Joy A. Colucci, Ph.D., is a freelance science writer. She also writes for TerraComm, a scientific communications company in Mountain View, California. She may be reached at 650-961-3917 or e-mail at [email protected].

Back