The Big Geodata Newsletter provides a quick monthly update on the recent news and developments in the big geodata domain. We try to keep it concise, informative and interesting. If you want to be informed about the developments in the rapidly changing landscape of big geodata please subscribe:
In this issue you will find information on XDGGS for planetary-scale data cube computations with Discrete Global Grid Systems, Arkouda for large-scale geocomputing by using Pangeo stack, DMR++ for easy access to HDF4/5 data on the Cloud without reformatting, accelerated data analytics workflows with RAPIDS cuDF, and Vector Data Cubes for geospatial insights with spatiotemporal vector data.
In this issue you will find information on Icechunk for transactional cloud-native data storage, Marimo for reactive notebooks for dynamic code cell updates, Coiled’s call for community input on a geospatial benchmark suite, STAC GeoParquet for efficient cloud-native geospatial data handling, and Fields of the World - a multinational dataset for agricultural boundary segmentation!
In this issue you will find information on how millions of Dask nodes are managed in production, cloud-native access to NetCDF datasets by using Kerchunk, insights into EuroCrops - an open-source dataset for European crop analysis, and advancements in geospatial foundation models for image analysis, particularly enhancing NASA-IBM Prithvi's domain adaptability!
In this issue you will find information on 10th anniversary of PDOK and its new services, recent developments in efficient creation of multi-scale Zarr pyramids to boost big data storage and access, global urban green space dataset covering more than 1000 cities, and the GeoAI challenges to accelerate the implementation and monitoring of SDGs.
In this issue you will find information on recent foundational models leveraging EO datasets, new meta-learning frameworks, a global 30 m land-cover product and on an interesting challenge on crop yield prediction!
In this issue you will find information on HyperCoast, a new Python package for hyperspectral data; Coiled’s benchmarking of DataFrame technologies; news on the retirement of Microsoft’s Planetary Computer Hub; and on SIRCLE and SWAG, two new models that tackle the challenges of processing petabyte-scale EO data.
In this issue you will find information on Fiboa - a project standardising agriculture field boundary data, Sentinel-2 Super-Resolution model, Cubed - an alternative backend for Xarray, and CPU and GPU optimizations for LiDAR data processing.
In this issue you will find information on new ML-ready dataset standards Croissant and MajorTOM, GPU accelerated data analytics with NVIDIA RAPIDS cuDF, and a new analysis ready platform for Earth Video Cubes.
In this issue you will find information on PMTiles - a novel archive format for tiled data visualization, DiffusionSAT - a large generative model trained on high-resolution remote sensing datasets, the One Billion Row Challenge, and a MOOC on Cubes & Clouds.
In this issue you will find information on utilizing GDAL with AWS EMR-Serverless, GridMesa: an adaptive grid approximation model to handle large spatial data, and on creating EO data cubes with Cubo and XEE.
In this issue you will find information on a new method to visualise in-memory raster data using Leafmap, the Google-Microsoft combined buildings footprints dataset, cloud-optimised Geo-Zarr format, and a state-of-the-art global high-resolution canopy height model.
We wish all readers a very Happy New Year! In this issue you will find information on the new Spatial Extension for DuckDB, recent query optimization efforts for Dask Expressions, the Copernicus Data Space Ecosystem and state-of-the-art global database of 2 million training points to map landcover change! We also introduced you to Indupriya Mydur, who recently joined as a student assistant with CRIB.
We are happy to re-start the Big Geodata Newsletter as a precursor to an exciting new year! In this issue you will find information on NASA Earthdata Cloud, Coiled cloud platform for distributed computing, Lonboard - a Python library for rendering big vector data, and GraphCast - an AI model for global medium-range weather forecasting. Do not miss the promo-code for accessing courses from NVIDIA Deep Learning Institute for free!
In this issue you will find information on OCRE research funding for Earth Observation services, Copernicus Jupyter Notebook Competition, cuNumeric - a GPU-enabled drop-in replacement for NumPy at scale, xcube - an xarray-based EO data cube toolkit, and a method of deploying user-defined EO algorithms for large scale data analysis on the cloud by using Data Cube Resilient Distributed Datasets (DRDDs).
In this issue you will find information on Pyjion - JIT compiler for Python, MAAP - NASA/ESA Multi-Mission Algorithm and Analysis Platform , Radiant MLHub - Open Library for EO Machine Learning, TorchGeo - Deep learning datasets, transforms, samplers, and pre-trained models for geospatial data, and GISD30 - Global 30m impervious-surface dynamic dataset.
In this issue you will find information on the release of two datasets - TimeSpec4LULC and ESA WorldCover 10m, open call opportunities from C-SCALE, a “big picture” on machine learning market for Earth Observation, and one-more thing on Dask replacing Spark. Also welcome to join us at the Big Geodata Talk on openEO and the first Geospatial Computing Platform User Meeting!
In this issue you will find information on the SpatioTemporal Asset Catalog (STAC) specification, LotusSQL which is the SQL engine for high-performance big data systems, GEE Timeseries Explorer for QGIS, and a “big picture” on using video compression methods to store high-dimensional spatiotemporal data! Our regular upcoming events, recent releases, and CRIB news sections are here as well. We have also a short survey on our upcoming JupyterLab use cases workshop!
In this issue you will find information on Microsoft's Planetary Computer which is currently in private preview, how to process continental Sentinel 2 data with Dask, a Python package (geemap) to use Google Earth Engine within Jupyter-based environments, and a "big picture" from China on spatiotemporal distribution of aquaculture activities! Our regular upcoming events, recent releases, and CRIB news sections are here as well. We have also a short survey on big geospatial data sets!
Normally in this introduction part we summarize the newsletter content by listing the news items. This time we have only one: our new Geospatial Data Analysis Platform. It required a significant effort, but now we have a state-of-the-art interactive computing platform featuring GPU-backed and distributed data analysis and visualization capabilities directly accessible from your home or office (or home-office) computer. This first issue of the new year is devoted to the platform and will provide you information on its main features and capabilities. Please login to the system and have a closer look. We hope you will like it!
In this issue you will find information on the European Commission's Workshop on Big Data and Artificial Intelligence for Earth Observation, UT's new Virtual Research Environment service, parallel R in nutshell, Intel Geospatial - Intel's new cloud-based geospatial analytics platform, and Atlas of Global Surface Water Dynamics with beautiful maps and images based on big data analysis! Our regular upcoming events, recent releases, and CRIB news sections are also there. We have also a short survey on big data software needs!
In this issue you will find information on Φ-week - ESA's annual event dedicated to innovation in EO, cuGraph - large-scale graph analytics on GPUs, Thrill - a high-performance C++ framework for distributed computing, and a new settlement dataset for U.S. which goes back to 1800's at fine spatial resolution! Our regular upcoming events, recent releases, and CRIB news sections are also there!
In this issue you will find information on new Apache Sedona, f.k.a. GeoSpark, incoming research funding calls by Open Clouds for Research Environments (OCRE) for cloud and digital EO services, deck.gl - a state-of-the-art framework for visualisation of large-scale spatial datasets on the web, openEO - an open API to connect different cloud-based EO back-ends in a unified way, and a new wetland dataset produced by open EO data and cloud-based methods. In addition to the our regular section on recent software releases, we have also a new section on upcoming events!
In this issue you will find information on applications of spatial data cubes, eScience Center-ITC collaboration on large-scale phenological modelling, recent developments in GPU-accelerated distributed computing, our new web portal, a tool to extend spatial analysis capabilities of key-value databases, and a new method to fill gaps in earth observation data implemented in the cloud.
Greetings from the Big Geodata Newsletter!
In addition to developing a common infrastructure, providing assistance to your projects, and organizing trainings, we also want to keep you informed about the developments in the rapidly changing landscape of big geodata.