TL;DR Suggestion: Add calibration information and split OPTAA data into groups within netCDF4s to make data more accessible.
I’ve been recently working with OPTAA netCDFs and have spent some time trying to identify the appropriate pre/post calibration information and apply it. As an end-user, I’ve found the current system hasslesome to navigate (alfresco, document naming convention). There may be situations where users may want to re-perform processing with the OOI-acquired pre and post calibrations. At minimum, I would like to suggest a move toward hosting the ACS dev and pre/post cal files on a platform like the OOI Raw Data archive so that they can be easily searched for and acquired manually and programmatically. I would then like to suggest the addition of attributes within the netCDFs that contain hyperlinks to the cal files that are applicable to the data within the file. Having cal files accessible via the raw data archive would particularly be useful for users that want to re-process data on the OOI JupyterHub.
However, my hopeful suggestion is that the cal data be made available within the recovered/streamed OPTAA netCDFs directly so that it does not need to be acquired separately. I would also like to suggest that absorption/attenuation data be split into their own groups. This may help build end-user confidence that they are using the appropriate calibration information and data. I believe this will make the data coming out of the OPTAAs much more accessible.
I’ve put together an example file from CE02SHSP-D13, which includes the addition of the dev and pre-cal groups (see output_example.nc, ooi_acs_converter - Google Drive). I recommend looking at this file in Panoply (see attached images below). The example structure of the file is also below. I’ve also changed the dimensions of the absorption and attenuation data to be time and actual wavelength (not the iterative one originally assigned). This will make interpolation to common wavelength bins and 3D plots possible in only a few lines of code (at least in Python). Repetitious variables have been removed and added as single value attributes (e.g. num_wavelengths, preferred_timestamp). You can also make some attributes contain arrays, so I have done that to reflect lat/lon/time ranges in the root attributes.
From the example file I created, the file size did increase by ~10MB (~ 5%). Since the ancillary, metadata, absorption, and attenuation groups share some of the same coordinates, this can likely be reduced with careful planning and documentation. I estimate that the addition of the dev and .dat data account for roughly 3MB.
Within the Google Drive link above, I also include the test data I used and a Python notebook describing how I reorganized the netCDF. I hope that the addition of cal data and the separation of data into groups is something that is considered in the future.
Example Structure
- Root
- Group: Sensor Metadata
- Group: Ancillary Data
- Group: Absorption Data
- Group: Attenuation Data
- Group: Factory Calibration Data (.dev)
- Group: OOI Pre-Deployment Calibration Data
- Group: A1.dat
- Group: A2.dat
- Group: C1.dat
- Group: C2.dat
- Group: OOI Post-Deployment Calibration Data
- Group: A1.dat
- Group: A2.dat
- Group: C1.dat
- Group: C2.dat
Root Group
Contains no variables. Only contains attributes that are specific to OOI site/platform metadata and the file processing (e.g. institution, date_modified).
Sensor Metadata Group
Contains variables such as port_timestamp, suspect_timestamp, on_seconds, etc.
Ancillary Data Group
Contains variables not directly logged by the ACS, but used in processing such as sea_water_temperature.
Absorption Data Group
Contains variables related to processing of absorption data (the ACS ‘a’ side).
Attenuation Data Group
Contains variables related to processing of attenuation data (the ACS ‘c’ side).
Factory Calibration Data Group
Contains appropriately sized arrays to facilitate re-application of processing algorithms by advanced end-users.
Pre Deployment Calibration Data Group
Contains appropriately sized arrays to facilitate calculation of DIW offsets and re-application of processing algorithms by advanced end-users.
Post Deployment Calibration Data Group
Contains appropriately sized arrays to facilitate calculation of DIW offsets and re-application of processing algorithms by advanced end-users.
Meta Data Group
Data Groups
Cal Groups