Home ITCAbout ITCCentres of expertiseCenter of Expertise in Big Geodata ScienceBig Geodata StoriesEfficient Training of Slum Mapping Models using Data Shapley Approximations
Waleed Alzuhair, flickr

Efficient Training of Slum Mapping Models using Data Shapley Approximations

Do you need a geospatial computing platform?

About 1.1 billion people live in slums and this could balloon to 3 billion in just the next 30 years. Slums are overcrowded neighborhoods with poor housing conditions and a lack of basic services like clean water. Most of these areas are excluded from development and infrastructure plans simply because there’s no data about them. While a lot of research has already gone into mapping slums, only very recently have we really started to investigate the quality of the datasets used to create slum mapping models. We aim to develop models that can map slums at large scales by focusing on the quality of the input data and improving it where possible. Data-centric AI is a relatively new paradigm in AI that focuses on developing methods to systematically assess datasets in terms of its quality.

One of many data-centric AI methods is data valuation or calculating the value of each data point/sample. To do this, we used Gradient-Shapley (G-Shapley), an approximation of Data Shapley (DS) for deep-learning based models for image classification (Ghorbani et al., 2019), to our semantic segmentation task of slum mapping. High positive DS values meant these data points were helpful in the training of the model while negative DS values were harmful. We did not find any badly labelled patches since for our dataset and study area in Nairobi, Kenya, majority of the low DS data points were those composed mainly of non-built-up.


Figure 1. Five patches with high (top) and 5 patches with low (bottom) DS values.

We trained our models with several variations of the dataset: incrementally removing 10% of the dataset (up to 40%) starting from the (1) lowest DS value, (2) highest DS value, and (3) at random. We found that removing data points with high DS worsened performance while removing from the lowest DS maintained performance and made training faster compared to removing at random. This task was very compute-heavy and the Geospatial Computing Platform allowed us (Space4All project) to host our own machine that we could easily access and conduct our experiments.

Figure 2. Comparison of Slum F1-score (left) and training time (right) when removing patches randomly, based on high DS value, and low DS value.

Using the Geospatial Computing Platform allowed us to focus more on implementing our experiments than worrying about setting up our environment. The Centre of Expertise in Big Geodata Science (CRIB) team were also always very responsive whenever rare, unexpected crashes of our server would happen. This work had to be done in very short time due to conference deadline submissions and without the Geospatial Computing Platform, achieving these results in a short time span would have been impossible.

For more information:

SPACE4ALL project aims to combine for the first-time cutting-edge EO techniques and Citizen Science to deepen our scientific understanding of the vulnerability of slum areas to climate change and provide local communities with insights into local risks based on which climate action can be taken.

Winning 3-minute thesis (3MT®) presentation at IGARSS 2024 on this topic: https://bit.ly/IGARSS24_3MT1stPlace


F. Campomanes MSc (Enzo)
PhD Candidate

I am a PhD candidate under the Space4All project working on data-centric AI for transferable slum mapping. I’ve worked on societally relevant problems related to natural resources and hazards back in the Philippines and now continue my work in the slum communities in Africa, all while leveraging AI and citizen science.

L. Trento Oliveira MSc (Lorraine)
PhD Candidate

I am a PhD Candidate under the Space4All project. My research focuses on leveraging citizen science and geospatial data and techniques to assess flood vulnerabilities adapted to deprived contexts. My expertise also lies in urban poverty and vulnerability dynamics, including nature-based solutions, slum mapping using machine learning and flood exposure assessment.

dr. M. Belgiu (Mariana)
Associate Professor

I am an Associate Professor in the Department of Earth Observation Science (EOS) at the University of Twente's Faculty ITC. My mission is to use AI and multi-temporal remote sensing imagery to tackle environmental challenges. I develop innovative AI methods to analyze Earth Observation (EO) data, train algorithms in scarce-label environments, and ensure transferability. My current research also explores using EO and spatial data to address hidden hunger challenges, intersecting Earth Observation, AI, and food security for significant scientific and societal impact.

prof.dr. M. Kuffer (Monika)
Associate Professor

Monika Kuffer is a Professor at the University of Twente's BMS and ITC faculties. Her research focuses on sustainable development, particularly poverty, living quality, and economic development in urban and rural areas using remote sensing, GIS, and AI. She co-chairs the IDEAMAPS network and leads projects like IDEAtlas, SPACE4ALL, and ONEKANA. She directs NUFFIC-funded training grants in Sudan and Nigeria and serves on various committees, including JURSE, EARSeL, and EO for Sustainable Cities and Communities Toolkit.

dr. A.M. Dijkstra (Anne)
Associate Professor

Anne Dijkstra is an assistant professor in Science Communication in Twente. She studies the changing relationship between science and society and is leading research in several (international) projects. Previously, she worked as a project manager and senior science communication advisor. As a volunteer, she organises meetings for the Science Café Deventer. She also organised successful science-art festivals ‘KOP-festival’ and Science Café Noir.