Connected Data Products Hackdays Overview

Trey Stafford
Robyn Marowitz
Luis López

2026-05-13

Connected Data Products Hackdays

  • Berkeley, CA April 13-15
  • Co-hosted by Englacial and Berkeley Institute for Data Science.
  • Focus on bringing together members of the cryospheric data science community to explore connected and living data products.
  • April 13: Presentations/demos and ideas formation
  • April 14-15: Hack time!

The group

Image courtesy of Englacial

Demos/Presentations

Hack projects

  • earthaccess v1.0 updates
  • Virtualization of datasets for efficient cloud access (e.g,. radar sounder data).
  • Analysis of velocity changes at grounding zones.
  • Analysis of radar echo sounding crossovers with IceSat2 crossovers.
  • Geostatistics/Interpolation of data with GStatSim.

Englacial posted a nice writeup of the hackdays projects here:

https://englacial.org/posts/20240428-hackdays-projects/index.html!

earthaccess v1 hacking

  • Trey, Joe Kennedy (ASF), Julia Lober (graduate student at Boise State), Matt Fisher (DSE), and Jessica Scheick (U. of New Hampshire)
  • The project is working toward a v1 release, and there are various milestones/breaking changes that we want to complete before v1 is done.
  • We ended up focusing on updates to the results API, and providing convenience methods/properties to the Results class (e.g., we added the ability to export to GeoDataframe and worked toward attaching query provenance to the results).
  • Associated PR: https://github.com/earthaccess-dev/earthaccess/pull/1298
  • Lots of people expressed interest in this work, and were excited for the changes. Almost all of the projects (maybe all) used earthaccess.

GStatSim: New Applications for Large Radar Datasets

  • Team: Robyn Marowitz, Eliza Dawson (NOAA C&GC Postdoc Fellow, Georgia Tech), Joseph Rotondo (UW grad student), Lindsay Stark (Boise State grad student), Emmy Muniz (Stanford undergraduate student)
  • Expanded the GStatSim Jupyter Book documentation with two new real-world application notebooks:
    • HiCARS radar sounder — end-to-end workflow: downloading 860 granules from NSIDC via earthaccess, parsing flight-line text files into a single GeoParquet (9.3M points, EPSG:3031), and fitting variogram models for kriging and geostatistical simulation
    • Arctic pole hole — filling the gap in satellite sea-ice coverage near the pole using geostatistical interpolation of sea-ice albedo
  • Together these notebooks demonstrate real-world approaches to connecting datasets through an open-source package to produce a new data product
  • Repo: https://github.com/elizadawson/gstatsim_largedatasets

Common themes

  • Data archived in the cloud is not always “cloud-optimized” or “AI-ready”. Significant interest in data virtualization techniques (e.g, VirtualiZarr, IceChunk)
  • Importance of living/executable documentation (e.g,. via MyST).
  • Discussions about how to maintain citations for products that receive updates from a rotating cast of contributors.
  • The role of AI in scientific software/product development (e.g., Jupyter AI).
  • Reproducibility in science is more than just being open. Can AI be used as a “reproducibility checker”?
  • Challenges with scaling notebooks to production
  • Challenges with co-locating and co-analyzing data that were not “designed” to work together natively.

Questions?