All (or most) glider data for 2014 through 2018 appears to me missing from the IOOS Glider DAC. Where is it?
Hi Robert,
What access point are you using to view the datasets?
I see all of our expected deployments on the ERDDAP page:
https://gliders.ioos.us/erddap/info/index.html?page=1&itemsPerPage=1000
Thanks,
Collin
NSF OOI was not submitting glider data until well into 2018. I don’t know if there is a plan to back fill the NOAA IOOS DAC archive with NSF OOI data, but I agree that would be a nice service to provide to the community … a single point of contact for “all” (most) glider deployments in U.S. EEZ.
Hi Collin,
Thanks for the reply!
I see that most or all of the Coastal Pioneer glider profiles for 2014 are in the ERDDAP dataset, indeed. Thanks.
But they’re not available in the map page. I routinely search for glider files by using:
IOOS Gliders Map - Catalog - Catalog
None of the cp glider tracks for 2014 through 2018 show up here. When I last downloaded them ca. 18 months ago I could see the glider tracks. Where did they go?
Also, the cp glider files currently in the ERDDAP page you sent do not have the QARTOD quality flags included, and I know these were updated sometime in the last 18 months to include the latest QARTOD QA/QC protocol. Where are these data?
Thanks,
Bob
Hi Bob,
Building off of Collin’s ERDDAP answer, here is some Python code that will list all of the glider datasets that occur near the New England Pioneer Array. You can modify the lat, lon, and time bounding limits to find deployments of interest. Note that this lists ALL datasets from the GliderDAC, not just the Pioneer Array. If you want to whittle it down, PA datasets have the ‘cp_’ prefix.
In the linked example I search for glider deployments that occurred between 2014-01-01 and 2018-12-31 and within 38 to 41 N, -72 to -69 E. I then specify and download ‘cp_336-20180127T1620-delayed’, which is roughly 1.1GB in size. Let me know if you would like me to write a Python script that will download all of the datasets for you.
It looks like there is an rerddap package on CRAN. I can take a crack at working up a similar script sometime next week if you are looking to stick with R.
Cheers,
- Ian
Hi Ian,
If I had an R-script to download all the PA glider files it would greatly speed up my work! As you know I don’t work in python, so if I would definitely take you up on your kind offer to write it in R. I intend to download all 9-years of cp glider files from 2014 to 2022, including all available data products.
Here’s another question I’ll throw out there for you or anyone else. The ‘precise’ times and lat/lons are interpolated between the times the glider is at the surface and getting a real gps fix, right? I notice that these lat/lons are expressed to 8 decimal places, which is an accuracy of about one millimeter, which seems rather unrealistic. Does anyone know what the level of accuracy actually is for these ‘precise’ lat/lon values?
thanks,
Bob
Hi John,
As of 2021 we have backfilled the DAC with all archived OOI deployments.
Thanks!
Collin
Hi Bob,
I just searched the map page and I do not see Pioneer Array deployments for 2014-2018 either. I submitted a ticket to the DAC support to see if something changed on their end that is preventing them from showing up.
As for the QARTOD flags, we do not currently submit any QARTOD flags with our DAC submissions. The GDAC processing provides QARTOD flags for the CTD data, but it is often incomplete and not very useful yet. I gathered this information from Stuart Pearce, who wrote the code for our GDAC submissions. He would be able to provide more information but is out of office for the next week.
Thanks,
Collin
Hi Bob,
Here is an R-script that will download ALL PA glider deployments to date from the GliderDAC (165 deployments). You will need to change the save_directory to a place of your choosing. Let me know if you run into any issues.
Note that a ‘Download Progress’ window may appear depending on which download method your system supports, but the progress bar may not move.
I was able to download a couple of files though and confirmed they opened with Panoply and that they contained all of the parameters listed in ERDDAP.
One issue I ran into was automatic timeout of the download.file function. Download through this method is slow (not sure why), so I set it to timeout after 30 minutes. You may have to increase it. I would recommend running this overnight or over the long weekend since you will probably be downloading over 50GB. You can validate if a file is is actually downloading by periodically checking the file size in the file properties. I would actually recommend that you do this on the OOI JupyterHub, which supports R.
Some Info About “Precise”
Here is the comment for the ‘precise_lat’ variable.
comment = “Interpolated latitude at each point in the time-series”
If I had to guess, your assumption is correct in that the times, distance travelled, etc, are used to estimate the lat/lon of the sample.
There is also a ‘latitude’ variable, which has the comment.
comment = 'Value is interpolated to provide an estimate of the latitude at the mid-point of the profile.
I am probably wrong, but I think ‘precise’ in this context is more of an estimate of the spatial location of the glider sample based on the recent call-in and directional information. If I recall, when I would do anchor surveys for the OOI moorings I was required to resolve lat/lon to the fifth decimal (nearest meter). If you were going to do comparisons against OOI mooring locations, you could probably safely round to that point.
It might take some effort, but you could recreate the required QARTOD tests. Technically the only tests that really require the ‘real-time’ aspect are the syntax, gap, and locations tests. I would say that if a sample doesn’t have any fill values in the file, then you can say it passed the syntax and gap test. The location test can be re-run by verifying if the lat-lon are real (-90,90 N and -180,180 E, or within your own defined bounding box).
You could then recreate the gross range test by using the sensor manufacturer threshold or values of your choice.
The tests that would be the most difficult to recreate would be the climatological tests for temperature and salinity because those would require a 3D (lat, lon, depth) seasonal lookup. Alternatively, you could set the limits for this test to values of your choosing for each season. The language in these tests makes them fairly flexible.
Cheers,
- Ian
Thanks Collin. I was confused by the missing glider data in the map.
About the QARTOD quality flags, they were in the PA glider datafiles (for T and S only) that we downloaded in September 2023, but I see that the same data files downloaded now are missing these flags. So they were added, then removed?
-Bob.
Hi Ian,
Thanks for this. I assume if I want to download the glider files as *.csv files then I change all occurrences of *.nc to *.csv, and sep = ‘’ to sep = ‘,’ right?
-Bob.
Hi Bob,
I reached out to the folks at IOOS support with your questions about the deployments missing from the interactive map and the QARTOD flags. Here’s what they shared:
Regarding the map, they mentioned that only real-time datasets are included due to data density issues. Typically, delayed datasets do not make it onto the map, with a few exceptions. I noticed a few delayed datasets on the map (e.g., cp_388-20190618T2258-delayed). They mentioned that there might be a few leftovers on the map, but the majority of delayed datasets will not appear.
The timeframe of Pioneer gliders that you reported missing from the map (2014 through 2018) aligns with our submission history, as we have delayed datasets only for those early deployments.
Regarding the QARTOD flags, IOOS support confirmed that they are added after the files are submitted to the DAC. Normally, they can be accessed via ERDDAP under variable names qartod_*. However, they report that there have been some issues with their QC tests that they are working to resolve. So it’s entirely possible that some datasets that previously had the flags in the dataset are missing them now. They hope to resolve these issues soon so that they are available to users for all datasets in the near future.
Thank you,
Collin
Thanks Collin for getting back to me. Quick question: what does the ‘delayed’ notation mean when appended to a data file name?
-Bob.
Also, in line 14, should there be a question mark following the “ .nc “ ?
-Bob.
Hi Bob,
In this context the “delayed” notation means that the dataset is comprised of recovered (full resolution) data. When the “delayed” not included the dataset is realtime (telemetered) and decimated.
-Collin
Hi Bob,
Here is a script for that should work for CSV files. I just updated it because I had accidentally left some old test code.
The sep parameter is just used to build the save filepath not for specifying the file separator, no need to change it.
I believe the question mark at the end is used to mark parameter options in the request url. Since the space after the question mark is empty, then all parameters are kept in the file.
The other thing to is to change the mode parameter in download.file to ‘w’ instead of ‘wb’, since CSV files are text files.
Cheers,
Ian
Thanks, that makes sense.
Hi Bob, just a word of caution. As I’m sure you know, CSV files tend to be huge, on the order of at least 10x what a netcdf file would be. So if the full dataset is 50GB, as Ian roughly estimated, you might need 500GB of space if you request csv files.
That said, if you don’t need all the variables that Erddap provides, you could tweak the URLs after the ? to just include the variable you really need, thus saving a lot of local space.
You can play with the data request page in Erddap to figure out the URL syntax you need.
Thanks Sage, I wasn’t aware the netcdf files were so much smaller. I will consider working with this format instead.
-Bob.
So I just did a quick test on CP387 which was recovered this month after ~40 days… the full dataset in CSV is 21.9MB while the .nc is 12.1MB. I’m not quite sure why there is only a 2x difference, but I’m unfamiliar with how Erddap handles compression. Bottom line, you do save a bit of space. But you can save a bunch more if you don’t request all the variables.
Also, in case you’re not familiar, Erddap’s Advanced Search page is a nice way to find specific datasets (or glider deployments in this case) by title, date range and/or lat/lon box. Here’s an example search that simply finds all the datasets mentioning “OOI.” You could add “Pioneer” or “CP_” accordingly to just get those arrays, or use the date and location filters. The scripts Ian shared basically use the API version of the form.