Mismatch between OpenDAP and HTTPServer Datasets

I am attempting to use CE02SHSP OPTAA Deployment 13 from the GoldCopy catalog and have found that the values for external_temp_raw, internal_temp_raw, c_signal_counts, a_signal_counts, a_reference_counts, beam_attenuation, optical_absorption, and c_reference_counts do not match between data accessed via OpenDAP and for the file downloaded via HTTP.

I accessed data from the OpenDAP and HTTP Server URLs provided here: Catalog Services

In the case of external_temp_raw and internal_temp_raw it is apparent that the OpenDAP dataset is incorrect because these values are negative.

Here is a link to a Jupyter Notebook that highlights this issue.

Ian,

The variables you are referencing are all stored in the NetCDF files as unsigned integers. In the process of converting the NetCDF files to OpenDAP to a file object to an xarray.dataset (your OpenDAP data), those unsigned integers are getting converted to a signed float and improperly cast with the occasional negative value. When you go from converting the NetCDF files to a file object to an xarray.dataset (your file data), the values are still getting converted to a signed float, but they are being cast correctly and there are no negative values.

There are a couple ways to get around this, but they would require you to write code that will recast the variables based on the attributes. Or…don’t use OpenDAP and change your file code to use a couple of additional settings:

ds = xr.open_dataset(filename, decode_cf=False, mask_and_scale=False)

The OOI NetCDF files are not CF compliant. You need to explicitly tell xarray this, otherwise it will assume the data is CF compliant and interpret it accordingly. This can and will bite you at random. You then need to add a small amount of additional code to convert the time values back into datetime64 values, e.g.,:

import re

# Explicitly set the time variables to datetime64 values
time_pattern = re.compile(r'^seconds since 1900-01-01.*$')
ntp_date = np.datetime64('1900-01-01')
for v in ds.variables:
    if 'units' in ds[v].attrs.keys():
        if isinstance(ds[v].attrs['units'], str):  # because some units use non-standard characters
            if time_pattern.match(ds[v].attrs['units']):
                del (ds[v].attrs['_FillValue'])  # no fill values for time!
                del (ds[v].attrs['units'])       # time units are set via the encoding
                ds[v].encoding = {'_FillValue': None, 'units': 'seconds since 1900-01-01T00:00:00.000Z'}
                np_time = ntp_date + (ds[v] * 1e9).astype('timedelta64[ns]')
                ds[v] = np_time

# sort by time
ds = ds.sortby('time')

Hopefully that helps.

Cheers,
Chris