Greetings from the Big Geodata Newsletter!
We are happy to re-start the Big Geodata Newsletter as a precursor to an exciting new year! In this issue you will find information on NASA Earthdata Cloud, Coiled cloud platform for distributed computing, Lonboard - a Python library for rendering big vector data, and GraphCast - an AI model for global medium-range weather forecasting. Do not miss the promo-code for accessing courses from NVIDIA Deep Learning Institute for free!
We also introduce you to Jay Gohil, who recently joined as a researcher with CRIB. Stay tuned to our updates and newsletters in 2024, we’ve got an eventful year lined up!
Happy reading!
You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.
Earthdata on the Move
Image credits: Catalina Oaida, PO.DAAC
Moving NASA Earth science data to the commercial cloud is seen as a need with the massive volume and velocity of satellite data in today’s times. A new spotlight feature explores some of the many resources being developed to help users adopt a cloud-based workflow and enable Open Science. NASA in collaboration with Openscapes has developed a framework to engage mentors as well as share resources that can help users in this transition. A key learning resource developed is the Earthdata Cloud Cookbook, which includes how-to guides on using NASA data on the Cloud. The team designed illustrative cheat-sheets to help visualize what working with NASA Earthdata Cloud data looks like. Using guidelines of the cookbook, the Earthdata science community can also contribute in developing this resource.
You can access the cloud datasets by using earthaccess and Harmony-py libraries easily. If interested, follow the discussions on the Earthdata Cloud Cookbook GitHub repository to attend ‘hackday’ events.
How to process terabyte-scale EO data efficiently?
Image credits: Coiled, 2023
With the rapid migration of Earth Observation data and workflows to cloud services, the question of time and cost optimization becomes highly relevant. Coiled is a lightweight cloud platform designed to efficiently use AWS or Google Cloud computing services by using dask. Most EO data on the cloud is used for workflows following the “run same function on many files” approach. This can be easily optimized with minimal code using Coiled, which allows the user to exploit data proximate distributed processing. When processing long time-series datasets such as the NASA Earthdata on S3, the blog post demonstrates a drastic reduction in time (~42x speedup) and cost (~16x less) using Coiled. Additionally, Coiled allows users to further optimize by using spot instances and ARM-based processors on the cloud instead of conventional options. Such efficient processing brought down the cost of processing 500 GB of NASA Earthdata to just 0.33 EUR!
Test out the free version of Coiled at coiled.io/start with interesting use cases. Follow NASA Openscapes for more information on cloud migration of NASA Earth Observation data.
Lonboard to render 3 million points in 2.5 seconds!
Image credits: Develoment Seed, 2023
Lonboard is a Python library for fast, interactive geospatial vector data visualization in Jupyter. It is built on four foundational technologies: deck.gl, GeoArrow, GeoParquet, and anywidget. deck.gl is a JavaScript geospatial data visualization library. Because deck.gl uses the GPU in your computer to render data, it's capable of rendering very large spatial data performantly. GeoArrow is a memory format for efficiently representing geospatial vector data in uncompressed manner. GeoParquet is a file format for efficiently encoding and decoding geospatial vector data, which supports very efficient compression methods. anywidget is a framework for building custom Jupyter widgets that makes the process much easier. By combining these powerful features together, Lonboard manages to display geospatial vector data swiftly! On a dataset with 3 million points, ipyleaflet crashed after 3.5 minutes, pydeck crashed after 2.5 minutes, but Lonboard successfully rendered in 2.5 seconds.
Upcoming Meetings
- EGU 2024
14-19 April 2024, Vienna, Austria - GEOINT 2024
5-8 May 2024, Florida, USA - Geospatial World Forum (GWF)
13-16 May 2024, Rotterdam, Netherlands - AGILE 2024, Geographic Information Science for a Sustainable Future
4-7 June 2024, Glasgow, UK - IEEE International Conference on Big Data Computing Service and Machine Learning Applications (BigDataService 2024)
15-18 July 2024, Shangahi, China - Big Data Expo
11-12 September 2024, Utrecht, Netherlands
Get your NVIDIA Deep Learning Teaching Kit for free!
Image credits: NVIDIA, 2023
The NVIDIA Deep Learning Institute (DLI) offers resources for diverse learning needs, from learning materials to self-paced and live training to educator programs. Individuals, teams, organizations, educators, and students can find everything they need to advance their knowledge in AI, accelerated computing, accelerated data science, graphics and simulation, and much more including GeoAI. Either you are a student or an instructor, the initiative invites you to explore AI advancements and interdisciplinary collaboration, positioning yourself at the forefront of innovation. Plus, you can get industry-recognized certificates to boost your career.
We've scored a special deal for you! Use code (please contact us for the code) to get free access to DLI’s wide range of online, self-paced courses in deep learning, accelerated computing, data science, and graphics & simulation. Check this page to redeem the code worth 80 EUR and start learning today!
Recent Releases
- Keras: High-level neural networks API
3.0.0 (28/11/2023) - PyLandStats: Open-source library to compute landscape metrics
3.0.0 (20/11/2023) - TensorFlow: End-to-end machine learning platform
2.15.0 (14/11/2023) - GeoPandas: Python tools for geographic data
0.14.1 (11/11/2023) - Apache Sedona: Cluster computing for large scale spatial data
1.5.0 (12/10/2023)
The "Big" Picture
Image credits: Lam et al., 2023
There is a great need for quick and precise forecasts in a world where weather patterns are becoming more intense. A cutting-edge AI model GraphCast forecasts weather conditions up to 10 days ahead of time with far greater accuracy and speed compared to the industry standard High Resolution Forecast (HRES) generated by the European Centre for Medium-Range Weather Forecasts (ECMWF). GraphCast uses Graph Neural Networks (GNNs) designed for processing spatially structured data. It operates at a high resolution of 0.25 degrees covering over a million grid points globally and predicts various Earth-surface variables and atmospheric conditions at different altitudes. Despite its computationally intensive training, GraphCast exhibits high efficiency, generating 10-day forecasts in less than a minute on a single Google TPU v4 machine. This stands in stark contrast to conventional methods like HRES, which may require hours of computation on supercomputers. In a comprehensive evaluation, GraphCast demonstrated superior accuracy, outperforming HRES on over 90% of 1380 test variables and forecast lead times. In the critical troposphere region (6-20 kilometers high), crucial for accurate forecasting, GraphCast outshone HRES on an impressive 99.7% of test variables for future weather conditions.
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., & Battaglia, P. (2023). Learning skillful medium-range global weather forecasting, Science, doi:10.1126/science.adi2336