Waleed Alzuhair, flickr

Big Geodata Newsletter, July 2024

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

In this issue you will find information on HyperCoast, a new Python package for hyperspectral data; Coiled’s benchmarking of DataFrame technologies; news on the retirement of Microsoft’s Planetary Computer Hub; and on SIRCLE and SWAG, two new models that tackle the challenges of processing petabyte-scale EO data.

Dr. Jon Wang from the Department of Urban and Regional Planning and Geo-Information Management (PGM) at ITC shares his experience in using our Geospatial Computing Platform in Understanding Cities with their Physical Forms at Large Scale. Don't miss the Big Geodata Story! 

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

HyperCoast: Interactive Hyperspectral Data Visualization!

Image credits: HyperCoast, 2024

HyperCoast is a new Python package that can interactively visualize and analyse hyperspectral data. The packages lets users search and download from well-known NASA hyperspectral datasets such as AVIRISDESISPACE, and ECOSTRESS. As every hyperspectral sensor uses different file formats, the developers built specific tools that can visualize data for each kind of dataset. With the possibility of changing band combinations and colormaps, the module facilitates interactive extraction and visualization of spectral signatures in the JupyterLab interface. The selected spectral signatures can then be saved as a CSV file. Built on 3D visualization package PyVistaHyperCoast can also visualize hyperspectral data as a sliceable 3D cube with interactive analysis capabilities!

Primarily developed for coastal areas, this module by Dr. Qiusheng Wu and team, can be extended for use in all domains using hyperspectral data. Check out a demo video on the 3D visualization capabilities, and follow the tutorials for using this package.

Microsoft Retires Planetary Computer Hub

Image credits: Planetary Computer, 2024

Microsoft announced the retirement of the Planetary Computer Hub on June 6, 2024. This decision was attributed to a shift in Microsoft’s strategic focus towards tightening security requirements across all Microsoft systems. The Hub was essential in aiding researchers and organizations to access and analyze large-scale environmental data effectively. This update solely impacts the Planetary Computer Hub; the Planetary Computer Data and APIs will stay untouched. 

For a more insights into this decision and instructions to retrieve your Hub home directory, users can refer to the official GitHub discussion

Benchmarking DataFrames at Scale: Coiled's TPC-H Analysis

Image credits: Coiled, 2024

Coiled recently conducted a comprehensive benchmark analysis using the TPC-H suite to evaluate the performance of various DataFrame technologiesSparkDaskDuckDB, and Polars—across different scales and hardware setups. The study revealed that while no single project consistently outperformed others, DuckDB and Dask showed robust performance across many scenarios. Spark, though widely used, lagged in efficiency and ease of use. Polars excelled in small-scale local tasks but struggled with larger, cloud-based workloads. The findings provide valuable insights for selecting the right tool based on specific use cases and data scales

For a detailed breakdown of the benchmarks and specific performance metrics, read the full article here. 

Upcoming Meetings

Recent Releases

The "Big" Picture

Image credits: Consoli et al., 2024  

Researchers have unveiled SIRCLE (Signal Imputation and Refinement with Convolution Leaded Engine) and SWAG (Seasonally Weighted Average Generalization), two new tools designed to tackle the challenges of processing petabyte-scale Earth Observation (EO) time-series data from missions like NASA's Landsat and ESA's Sentinel. Current solutions offer limited flexibility when handling anomalies like cloud cover. SIRCLE, introduces flexibility in time-series processing through adjustable convolution kernels. SWAG, integrated within SIRCLE, then leverages seasonality in EO data to reconstruct missing values, prioritizing recent images for enhanced accuracy. Benchmark tests reveal that SWAG reduces reconstruction errors by at least 15% compared to other methods. In a significant large-scale application, SIRCLE and SWAG processed the entire Global Land Analysis and Discovery (GLAD) ARD-2 Landsat archive. This effort produced a cloud-free bi-monthly product spanning 1997 to 2022, involving over two trillion pixels. The process was completed in approximately 28 hours using 1248 Intel Xeon CPUs. The processed data, stored as Cloud-Optimized GeoTIFFs (COG), are now open-access, enabling efficient and affordable environmental monitoring and analysis.

Davide Consoli, Leandro Parente, Rolf Simoes et al. A computational framework for processing time-series of Earth Observation data based on discrete convolution: global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution, (23 May, 2024), PREPRINT (Version 1) available at Research Square, https://doi.org/10.21203/rs.3.rs-4465582/v1


CRIB News
Join us for the Geospatial R workshop!

In this training workshop, participants will embark on a journey through the fundamentals of geospatial data analysis using R. With its excellent statistical capabilities and a huge package ecosystem, R supports transparent data analysis workflows with an emphasize on reproducible research. Geospatial R packages, such as sf, raster, and leaflet, enable complex geospatial studies and striking visualizations that facilitate getting insights from geospatial datasets. Join us to learn how to use R to access, analyze, and visualize spatial data!

The workshop will be at ITC Building on 21-22 August, 2024. For more information and registration please visit the event page.