Waleed Alzuhair, flickr

Big Geodata Newsletter, June 2024

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

In this issue you will find information on Fiboa - a project standardising agriculture field boundary data, Sentinel-2 Super-Resolution model, Cubed - an alternative backend for Xarray, and CPU and GPU optimizations for LiDAR data processing. 

OpenEO offers tools and an open API that allows users to explore and analyze vast amounts of EO data across various cloud backends. Join the CRIB training workshop to discover OpenEO’s capabilities for creating and managing interoperable EO data processing workflows and use it in hands-on coding sessions! 6 June 2024, 9:00 - 16:00, ITC. Register now!

Nishit Patel MSc., ITC graduate in 2023, shared his experience of using our Geospatial Computing Platform to accomplish his MSc. Thesis project on Downscaling Land Surface Temperature using SAR images: A Machine Learning Framework under the supervision of Dr. ing. H. Aghababaei and Dr. F.B. Osei (Frank) from the Department of Earth Observation Science (EOS). Don't miss the Big Geodata Story!

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

Fiboa: a new initiative for agricultural data interoperability


Image credits: DigiFarm, 2023

The Cloud-Native Geospatial Foundation and the Taylor Geospatial Engine have introduced fiboa (Field Boundaries for Agriculture), a project to standardize field boundary data. Fiboa is an ecosystem of data schema specifications, tools and data for more interoperability between field boundaries and related agriculture and other data. The initiative aims to enhance data sharing and innovation in agriculture by creating a standard schema. The core data schema of fiboa is quite simple by design and represents field boundary data in GeoJSON / GeoParquet, with several optional extensions to specify additional attributes in a standard manner. Leveraging AI and satellite imagery, fiboa aims to enable rapid and cost-effective global data generation promoting sustainable farming

For more information, follow the article by CNCF or dive into fiboa with one of the tutorials.

Sentinel-2 super-resolution datasets


Image credits: Yosef Akhtman, 2023

Super Resolution is a technique that uses pixels values of an original image and tries to reconstruct the image with higher spatial and textural detail. The Sentinel-2 Deep Resolution 3.0 (S2DR3) is one such recent model that uses an artificial neural network architecture similar to SISR (Single-Image Super Resolution), in order to upscale all 12 spectral bands of a Sentinel-2 L2A scene from the original 10, 20 and 60 m/pixel spatial resolution to 1 m/pixel!

As explained by Yosef Akhtman from Gamma Earth on the Satellite Image Deep Learning podcast, there is no real training data that is of the desired resolution and spectral characteristics similar to the S2-MSI sensor. Hence, the training data used in the model was synthesised using high-resolution satellite data considering the design of the MSI push-broom sensor. The model is able tease out the cross correlations of neighbouring pixels in all bands and estimate a higher “resolution” pixel for each band. Although the technology is commercially used mainly for high-precision field boundary delineation, such super-resolution EO data can be potentially used for other applications as well.

This Colab notebook can be used to test the model, some examples are also presented here. Follow the links for discussion on the use cases and on other Super Resolution for EO models.

Cubed - an alternative parallel computation backend for Xarray

Image credits: Tom Nicholas and Tom White, 2023

Xarray integrates with Dask under the hood to process array-like datasets that are larger than memory. However, the ability to process arbitrarily large arrays is still a task that depends on the infrastructure available. Researchers from the Climate Data Science Lab at Columbia University developed a bounded-memory, serverless alternative to Dask called Cubed, which integrates with Xarray by following the common Python Array API standards. Mainly designed to process cloud-native formats like Zarr, it can be used with Xarray without much changes in code.

One of the key differences between Dask and Cubed is the way task graphs are built. Cubed builds task graphs with multiple parallel "blockwise" or "rechunking" operations on Zarr storage objects. The amount of memory required to process the operations can be predicted in advance and limited accordingly. As each chunk is processed parallelly with known memory utilization, Cubed is able to abstract these tasks as serverless functions that can be run using cloud services like AWS Lambda or cloud vendors like Lithops, without a need for a cluster.

Upcoming Meetings

Recent Releases

The "Big" Picture

Image credits: Oregon State University (background) and  Muñoz et al., 2024 (comparison graph)

To optimize LiDAR data processing for Digital Terrain Models (DTM), researchers have made significant advancements by porting the Overlap Window Method (OWM) from R to C++ and incorporating parallel computing frameworks like OpenMP, TBB, SYCL, and CUDA. The study focused on enhancing data structures and reducing memory access, resulting in processing speeds up to 19 times faster on CPUs and 83 times faster on GPUs compared to traditional methods. The optimizations were tailored to the specific capabilities of different devices, for instance, memoization and granularity tuning boosted CPU performance, while adjustments to the data structure and enhanced parallelism were more impactful on GPU performance. SYCL emerged as a promising tool, enabling a unified code base that can run across various devices. The research team plans to explore hybrid CPU-GPU implementations and consider the performance of these optimizations on low-power devices that can be integrated directly with LiDAR sensors.

Muñoz, F., Asenjo, R., Navarro, A., and Cabaleiro, J. C. (2024). CPU and GPU oriented optimizations for LiDAR data processing. Journal of Computational Science, 102317, doi:10.1016/j.jocs.2024.102317.

CRIB News
New OpenEO backends

Aligned with the roadmap to improve our Centre's computing infrastructure, we will be launching several OpenEO backends on the Geospatial Computing Platform, widening the current scope of our on-premise cloud computing capabilities. OpenEO is an open-source alternative to proprietary computing platforms to process Earth Observation data at scale. If you want to learn more about OpenEO, don't miss our OpenEO training workshop on 6 June 2024!