Dealing with large profiler datasets in ERDDAP

If you have been working with profiling datasets at http://erddap.dataexplorer.oceanobservatories.org/ , you may have noticed that the results are very sparse (the resulting table is full of empty rows) and can take a long time to load.

This is because the underlying data is sparse when viewed in 2d across the time and depth axes:

Salinity data from a shallow profiler mooring

As it stands now, individual profiles are not tagged in OOI profiler data, so the only option is to plot the data versus time and depth.

We are working on updating these ERDDAP datasets to make them more efficient. To do this, we are converting the underlying netcdf files from multidimensional dims=(time,z) to “flat” files with row as the only dimension, and removing any empty rows. Since ERDDAP treats all data as tabular anyway, this makes the dataset more efficient without compromising functionality. The dataset metadata is unchanged. Using this approach, we can reduce the file size by almost 90% in many cases, and the processing/download time goes down correspondingly.

As of this writing, this approach has been rolled out to over a dozen of the largest profiler datasets, and we will continue to update datasets over the next few weeks.

In the meantime, if you encounter a sparse dataset, one workaround is to download the netcdf files directly. Since the raw files have not gone through ERDDAP, they are compressed and thus much smaller. Keep in mind that metadata is added at the ERDDAP dataset level, so the metadata in these raw files is very limited.

To get to the netcdf files: Anywhere you see a Download button in the Data Explorer, click it and then click on Dataset next to ERDDAP. Then from the ERDDAP dataset page, click Files in the header to get to a list of files.