Data Management in DGP

DGP manages and interacts with a variety of forms of Data. Imported raw data (GPS or Gravity) is ingested and maintained internally as a pandas.DataFrame or pandas.Series from their raw representation in comma separated value (CSV) files. The ingestion process performs type-casts, filling/interpolation of missing values, and time index creation/conversion functions to result in a ready-to-process DataFrame.

These DataFrames are then stored in the project’s HDF5 data-file, which natively supports (with PyTables and Pandas) the storage and retrieval of DataFrames and Series.

To facilitate storage and retrieval of data within the project, the HDF5Manager class provides an easy to use wrapper around the pandas.HDFStore and provides utility methods for getting/setting meta-data attributes on nodes.