Greetings from the Big Geodata Newsletter!
In this issue you will find information on Pyjion - JIT compiler for Python, MAAP - NASA/ESA Multi-Mission Algorithm and Analysis Platform , Radiant MLHub - Open Library for EO Machine Learning, TorchGeo - Deep learning datasets, transforms, samplers, and pre-trained models for geospatial data, and GISD30 - Global 30m impervious-surface dynamic dataset. Our regular upcoming events and recent releases are here as well.
Happy reading!
You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.
Pyjion: JIT Compiler for Python
Image credits: Pyjion, 2022
Python plays an important role in big data analysis. However, native Python code is notoriously slow. Pyjion is a Just-In-Time (JIT) compiler for Python, which compiles code to native C intermediary language and executes it using the .NET Common Language Runtime. The main advantage of using this over other runtimes is the fact that Pyjion can execute all Python code faster without any code changes. Pyjion can be installed easily using the Python package manager and once installed can be imported into a Python 3.10 environment. Benchmarks show that Pyjion is about 2 to 3 times faster than the regular Python in real world usage. A detailed list of available optimizations can be found in the Pyjion documentation. You can also try out Pyjion in a live environment and check out the official website here.
Multi-Mission Algorithm and Analysis Platform (MAAP)
Image credits: ESA, 2021
NASA and ESA released a new open-science tool that provides seamless access to above ground biomass information from both NASA and ESA Earth observation data. The tool called Multi-Mission Algorithm and Analysis Platform (MAAP) is the result of a 2-year cooperation effort and brings together relevant data, algorithms, and computing capabilities into a common cloud environment. This brings greater opportunities for researchers to collaborate on developing algorithms as well as analyze and visualize large datasets acquired from various sources.The tool currently includes data from NASA and ESA missions such as African Synthetic Aperture Radar (AfriSAR) and Global Ecosystem Dynamics Investigation (GEDI), and more will be supported soon such as NASA/Indian Space Research Organization SAR (NISAR) and ESA BIOMASS. Studying the above ground biomass is an important area in climate change research as it allows researchers to calculate how much carbon is stored and how loss of biomass can affect this. MAAP can also be adapted for collaborative exploration of science data in other disciplines. MAAP products can be explored on the MAAP Dashboard or the joint platform entrance. MAAP also can be accessed through individual NASA and ESA landing pages.
Open Library for EO Machine Learning
Image credits: Radiant, 2022
Radiant MLHub is a cloud-based open library dedicated to Earth observation training data for use with machine learning algorithms. It hosts datasets and models generated by the Radiant Earth Foundation, partners, and community. Anyone can register to access, store, and share open training datasets or models for high quality Earth observations. Datasets are available for a wide variety of applications like building footprints, land cover, crops, wildfire, flood, and tropical storms. Examples of datasets accessible on the platform are Open Cities AI Challenge Dataset which includes drone imagery from 10 different cities and regions across Africa and SEN12 FLOOD which is a co-registered optical and SAR images time series for the detection of flood events. All available geospatial training data collections are stored using SpatioTemporal Asset Catalog (STAC) compliant catalogs. A Python client is available that allows users to easily interact with the datasets on the platform for which a quick start guide can be found here.
Upcoming Meetings
- Training: GPU Programming
11 May 2022
Netherlands eScience Center, Amsterdam
(Sign up) - Training: Parallel Programming in Python
17-18 May 2022, Netherlands eScience Center, Amsterdam
(Sign up) - World Data Summit
18-20 May 2022, Amsterdam, The Netherlands - Spatial Data Science Conference 2022
19 May 2022, The Royal Geographical Society, London, UK - FOSS4G 2022
22-28 August 2022, Firenze, Italy
(Call for papers) - ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2022
1-4 November 2022, Seattle, USA
(Call for papers) - AI for Good Summit
10-11 November 2022, Seattle, USA
(Registration) - IEEE International Conference on Big Data 2022
17-29 December 2022, Osaka, Japan
(Call for papers)
TorchGeo: Deep learning with geospatial data
Image credits: Microsoft, 2022
TorchGeo is a Python package for integrating geospatial data into the PyTorch deep learning ecosystem, making it easy for machine learning and remote sensing experts to use geospatial data in their workflows. TorchGeo provides data loaders for a variety of benchmark datasets, composable datasets for generic geospatial data sources, samplers for geospatial data, and transforms that work with multispectral imagery. Examples include the Canadian Building Footprints dataset containing about 12M computer generated building footprints, and various ML models including ChangeStar, Fully Convolutional Networks (FCN), and Residual Network (ResNet). The library can also be used to download datasets from Radiant MLHub (see above) and work on them. Using TorchGeo is easy if you are already familiar with PyTorch and a quick start guide demonstrating the various features of the library can be found here.
Recent Releases
- PROJ: Generic coordinate transformation software
9.0.0 (2022/03/01) - Apache MXNet: Deep learning framework allowing mixing of symbolic and imperative programming
2.0.0b (2022/03/23) - GeoPandas: Python tools for geographic data
0.10.2 (16/10/2021) - PyTorch: Tensors and dynamic neural networks in Python
1.10 (21/10/2021) - Dask: Library for parallel computing in Python
2021.10.0 (22/10/2021)
The "Big" Picture
Image credits: Zhang et al., 2022
A global 30 m impervious-surface dynamic dataset (GISD30) for 1985-2020 was produced by Zhang et al. using time series Landsat imagery on the Google Earth Engine platform. First, multitemporal compositing methods and relative radiometric normalization were applied on previously available 30 m land-cover products from which global training samples and corresponding reflectance spectra were automatically derived. Next, pretrained spatiotemporal adaptive classification models were applied to map the impervious surface in each period. Researchers stated that their model achieved an overall accuracy of 91% and a kappa coefficient of 0.866 using 18,540 global time-series validation samples. Comparing this model to similar 30 m impervious surface models, it was found that this produced the best performance with respect to spatial distributions and spatiotemporal dynamics. The latest model suggests that the global impervious surface has doubled in the last 35 years with Asia seeing the largest increase. The open-access dataset is available at this link.
Zhang, X., Liu, L., Zhao, T., Gao, Y., Chen, X., and Mi, J. (2022) GISD30: global 30 m impervious-surface dynamic dataset from 1985 to 2020 using time-series Landsat imagery on the Google Earth Engine platform, Earth Syst. Sci. Data, 14, 1831–1856, doi:10.5194/essd-14-1831-2022