Greetings from the Big Geodata Newsletter!
In this issue you will find information on RaQuet specification for storing and querying raster data efficiently, discover TerraTorch for fine tuning of geospatial foundation models, learn about new Python Zarr 3.0 insights and performances, Geoparquet Downloader QGIS plugin providing faster access to geospatial data, and GEDTM30 - a novel global digital terrain model.
Happy reading!
You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.
RaQuet: Efficient Raster Data Management
Image credits: CARTO, 2025
RaQuet is an open-source specification developed by CARTO, designed to efficiently store and query raster geospatial data using the Apache Parquet format. By employing a tile-based data organization with Web Mercator tile identifiers (Quadbin), RaQuet simplifies data indexing, enabling quick access to specific raster tiles. Each dataset is stored with pixels arranged in row-major order across per-band columns, accommodating multiple spectral bands commonly found in remote sensing and satellite imagery. The specification supports optional gzip compression, significantly reducing storage requirements while balancing data access speeds. RaQuet’s columnar approach enhances analytical performance, especially beneficial when handling extensive raster datasets. This structure also promotes interoperability with various data processing tools, simplifying integrating large-scale geospatial data within existing workflows.
Explore RaQuet and learn how to integrate efficient raster data management into your workflow here.
TerraTorch: Streamlining Fine-tuning of Geospatial Foundation Models
Image credits: TerraTorch, 2025
TerraTorch is an open-source Python toolkit developed by IBM to simplify the fine-tuning of Geospatial Foundation Models (GFMs). Built upon PyTorch Lightning and the TorchGeo domain library, TerraTorch offers a flexible framework for adapting pre-trained geospatial models to various downstream tasks. The toolkit provides access to a range of pre-trained GFMs, including Prithvi, SatMAE, and ScaleMAE, as well as backbones from the timm and SMP libraries. It supports tasks such as image segmentation, classification, and pixel-wise regression, allowing users to launch fine-tuning processes through intuitive configuration files or Jupyter notebooks. TerraTorch also integrates with datasets from GEO-Bench and TorchGeo, facilitating experimentation and benchmarking. By automating data preparation and model training, TerraTorch reduces the expertise required to adapt GFMs for specific applications. Its design emphasizes modularity, enabling users to interact with the framework at different abstraction levels, from high-level configuration to low-level customization. For those seeking a more guided experience, IBM's Geospatial Studio built on TerraTorch provides a user-friendly interface to assist researchers and developers in building geospatial AI models.
Explore TerraTorch and enhance your geospatial AI workflows here. Access the source code and contribute to the project here.
Accelerating Xarray: Harnessing Zarr-Python 3.0 for Enhanced Performance
Image Credits: Earthmover, 2025
The integration of Zarr-Python 3.0 has significantly improved Xarray's performance in handling large, complex datasets. This latest release addresses longstanding issues in Zarr-Python 2, particularly concerning the management of extensive or deeply nested hierarchies. For instance, opening the ARCO ERA5 dataset—a Zarr group containing 277 arrays—previously took over a minute; with Zarr-Python 3, this operation completes in under 15 seconds. The key enhancement lies in the asynchronous I/O operations introduced in Zarr-Python 3. Unlike its predecessor, which processed storage requests serially, the new version leverages Python's asyncio to handle multiple requests concurrently. This change dramatically reduces the time spent waiting for storage operations, especially when interacting with high-latency cloud storage systems like Google Cloud Storage (GCS). These advancements not only expedite data loading but also enhance the overall efficiency of data analysis workflows involving Xarray and Zarr.
Learn more detailed improvements in Xarray's performance with Zarr-Python 3 here.
GeoParquet Downloader: Time-saving Cloud-Based Geospatial Data Access in QGIS
Image credits: GitHub / Chris Holmes, 2025
GeoParquet Downloader is a QGIS plugin that facilitates efficient downloading of GeoParquet data from cloud sources directly into the QGIS environment. Developed by Chris Holmes, this tool enables users to access and integrate geospatial datasets from platforms like Overture Maps and Source Cooperative, as well as custom URLs pointing to online GeoParquet files or partitions. By focusing on the user's current viewport, the plugin ensures that only the necessary data is downloaded, optimizing both performance and resource utilization. The retrieved data can be saved in various formats, including GeoParquet, GeoPackage, DuckDB, FlatGeobuf, or GeoJSON, providing flexibility based on user preferences and project requirements. Installation is straightforward via the QGIS Plugin Manager. The plugin automatically manages dependencies such as DuckDB, simplifying the setup process. Its design emphasizes user-friendliness, allowing seamless integration of cloud-based geospatial data into QGIS workflows without extensive configuration.
Check the GeoParquet Downloader plugin and enhance your QGIS data integration capabilities here. Access the source code and contribute to the project's development here.
Upcoming EVENTS
- OSCT-DCC Hackathon: UT Research Software Template
ITC, Enschede, 7 April - AWS Summit
Amsterdam, 16 April - Cloud Native Geospatial Conference 2025
Utah, USA, 30 April - 2 May - CRIB Training: Introduction to Docker
ITC, 14 May - SURF Research Day
Hilversum, 20 May - Open and FAIR in NES
Utrecht, 22 - 23 May
The "Big" Picture
Image credits: OpenLandMap, 2025
GEDTM30 is a newly developed global digital terrain model (DTM) at 30-meter resolution, created using an advanced data fusion approach. Combining multisource datasets, including ICESat-2, GEDI, and global digital surface models like Copernicus DEM and ALOS World3D, GEDTM30 provides a comprehensive and accurate terrain representation. Leveraging over 30 billion lidar points, researchers employed a global-to-local transfer learning approach to create a globally consistent, locally optimized terrain model. The model significantly reduces elevation errors compared to other global datasets. Specifically, it achieves approximately 25.4% lower RMSE in urban areas, 10% in moderately forested areas, and 27.3% in densely vegetated regions compared to the Copernicus DEM. GEDTM30 also outperforms other state-of-the-art DTMs (MERIT DEM, FABDEM, FathomDEM) in validation tests using independent lidar and GNSS data. The entire dataset is openly accessible as Cloud-Optimized GeoTIFFs via Zenodo and the OpenLandMap STAC platform, fostering greater use in environmental modeling, geomorphology, and hydrology.
Read the GEDTM30 preprint here to find out more. Access the complete documentation and open data here.
Ho, Y., Grohmann, C. H., Lindsay, J., Reuter, H. I., Parente, L., Witjes, M., & Hengl, T. (2025). Global Ensemble Digital Terrain modeling and parametrization at 30 m resolution (GEDTM30): a data fusion approach based on ICESat-2, GEDI and multisource data. Preprint. doi:10.21203/rs.3.rs-6280607/v1