Greetings from the Big Geodata Newsletter!
In this issue you will find information on OCRE research funding for Earth Observation services, Copernicus Jupyter Notebook Competition, cuNumeric - a GPU-enabled drop-in replacement for NumPy at scale, xcube - an xarray-based EO data cube toolkit, and a method of deploying user-defined EO algorithms for large scale data analysis on the cloud by using Data Cube Resilient Distributed Datasets (DRDDs). Our regular upcoming events and recent releases are here as well.
Happy reading!
You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.
OCRE research funding for Earth Observation services
Image credits: OCRE, 2022
The EC-funded Open Clouds for Research Environments (OCRE) project opened the final funding call to distribute €6.5 million to research projects for use of Cloud and Earth Observation services. Application procedure differs depending on the type of service requested, and in case of EO, OCRE will provide funded services from its catalogue of EO suppliers to the value of €200,000 (minimum ask is €100,000) to projects awarded, based on their relevance and ability to demonstrate the impact of these services on research activities and outcomes. The EO catalogue currently includes 32 service providers providing services such as data analytics, EO data processing, interactive algorithm development, user algorithm hosting, and value added products. You can submit your proposal until 10 July by using a guided application form. For more information on the OCRE project and current adoption funding opportunities your can read their flyer or watch their recent webinar.
Copernicus Jupyter Notebook Competition
Image credits: WEkEO, 2022
As a subscriber of the newsletter, you are probably familiar with tools like Jupyter Notebooks, which are web based interactive documents that allow for easy interaction with data and visualize the results. (Big) EO data is not an exception. In order to help to stimulate new users and to drive innovation with Copernicus data and information, WEkEO is currently running a competition called the Copernicus Jupyter Notebook Competition. Participants can choose one of the four available tracks, coupled with land, marine, climate, or air quality thematic data. The submissions will be evaluated by a panel of independent judges, and the winning teams will be awarded cash prizes. The ultimate goal of the Competition is to build a community-driven resource of notebooks on the Copernicus. For more information you can check https://notebook.wekeo.eu/
Besides allowing you to discover the vast range of Copernicus data, the competition can help you to advance your interactive computing skills and also showcase your expertise to a wider community. Don't miss this opportunity!
cuNumeric: a GPU-enabled drop-in replacement for NumPy at scale
Image credits: NVIDIA Legate, 2022
NumPy is the de facto standard Python math and matrix library for scientific applications, which provides a simple and easy-to-use programming model. It sets a foundation for many of the most widely used data science and machine learning frameworks, especially in the geospatial and EO domains. cuNumeric is a library that aims to provide a distributed and GPU-accelerated drop-in replacement for the NumPy API, so that programs that have very large arrays of data that cannot fit in the memory of a single GPU or a single node can be span multiple nodes and GPUs easily - without changing the program code! Benchmarks show that good weak scaling with little drop in throughput is achievable while scaling up to 2048 A100 GPUs. The library is currently a work in progress and support for additional NumPy operators are added gradually. A complete list of available features is provided in the API reference.
cuNumeric is part of the Legate Project, which aims that any programmer can run code on any scale machine without needing to be an expert in parallel programming and distributed systems. Check their documentation for more details!
Upcoming Meetings
- FOSS4G 2022
22-28 August 2022, Firenze, Italy
(Call for papers) - ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2022
1-4 November 2022, Seattle, USA
(Call for papers) - AI for Good Summit
10-11 November 2022, Seattle, USA
(Registration) - IEEE International Conference on Big Data 2022
17-29 December 2022, Osaka, Japan
(Call for papers)
xcube: an xarray-based EO data cube toolkit
Source: xcube, 2022
xcube is an open-source Python package that can be used to convert Earth Observation and other geographical data into data cubes that can then be published. xcube is built upon a big data ecosystem that consists of popular Python packages like xarray, dask, and zarr. Datasets from popular providers like Sentinel Hub or ESA’s CCI Open Data Portal can be used for cube generation through APIs or other plugins. Once cubes have been generated from an external source and the xcube dataset has then been optimized, researchers can access, analyze, transform, and visualize the data for specific use cases. The package also supports extracting data points or resampling the input data with respect to time to generate temporal aggregations. To facilitate data exploration, xcube provides a lightweight viewer app that runs as a single webpage and allows users to visualize their data cubes. You can watch the video showcasing xcube's features to learn more about its capabilities, which are under active development. A detailed documentation, including user and developer guides, as well as xcube Dataset Specification, is also available.
Recent Releases
- cuSpatial: CUDA-accelerated GIS and spatiotemporal algorithms
22.06.00 (2022-06-07) - GeoWave: Geospatial and temporal indexing on top of key/value stores
2.0.1 (2022-05-31) - BEAST: Big Exploratory Analytics on Spatio-Temporal data
0.9.5 (2022-05-24) - OpenEO R Client: R client package for working with openEO backends
1.2.0 (2022-05-12) - Orfeo Toolbox: Open-source software for state-of-the-art remote sensing
8.0.1 (2022-04-27)
The "Big" Picture
Image credits: Xu et al., 2022
The popularity of cloud-based remote sensing platforms are on the rise for big geodata analysis. However, one drawback of such platforms is the support for user-defined algorithms. If required functions are not pre-implemented by the platform providers, it can be hard to implement custom algorithms, especially if they require specific libraries. One solution to this problem is to use containerization. Xu et al. propose a method of deploying user-defined remote sensing algorithms for large scale data analysis on the cloud. The EO datasets are first organized into homogeneous and analysis-ready Data Cube Resilient Distributed Datasets (DRDDs). Then composite containers are utilized that make use of Docker containers to run user-defined algorithm and task runners to transform the parameters and data cubes needed for the execution of the algorithm. Experiments carried out with 10-m resolution Sentinel 2 and using Support Vector Machine and U-Net based Deep Learning for continental-scale land cover mapping on 3 different platforms show that using the proposed approach gave better results than both Microsoft Planetary Computer and Google Earth Engine in terms of the number of pixels processed and the computation efficiency. The authors conclude that the proposed approach can help researchers quickly port legacy algorithms for EO to the cloud without rewriting them.
Xu, C., Du, X., Jian, H., Dong, Y., Qin, W., Mu, H., Yan, Z., Zhu, J. and Fan, X. (2022) Analyzing large-scale Data Cubes with used-defined algorithms: a cloud-native approach, Int. Journal of Applied Earth Observation and Geoinformation, 109:102784, doi:10.1016/j.jag.2022.102784