Predicting hyper-local temperatures using crowdsourced data and machine learning

GIMA
M-GEO
M-SE
STAMP
M-SE Core knowledge areas
Spatial Information Science (SIS)
Technical Engineering (TE)
Topic description

In cities, temperatures are often higher than in the surrounding areas and can also vary locally. This can depend on the type and color of surfaces (Steeneveld et al., 2014), as well as the morphology (Theeuwes et al., 2013). This "city effect" is called the urban heat island (UHI). We want to understand the climate variability at a local scale, as it can have a large impact on natural systems. We don't have a full theoretical understanding of these complex interactions yet. Therefore we propose to use a combination of a dense(r), crowdsourced, low-cost sensor network and machine learning techniques, to collect more data and find interactions between a huge number of variables.

Topic objectives and methodology

Problem definition

The goal of this research is to predict temperatures at a local resolution (e.g. street-level, neighborhood level), to get a more accurate representation of temperatures in urban areas. For this purpose, we will use observations collected by personal weather stations (PWS) and contributed to crowdsourced networks (i.e. WOW-NL, OpenSenseMap). These networks are composed of hundreds of PWS measuring the main weather variables (e.g. temperature, rainfall, wind speed), in contrast to the official KNMI network that operates 48 automatic weather stations (land and sea). Urban temperatures are often influenced by the PWS’ immediate surroundings. Hence, in this research, we would like to include (high-resolution) geospatial collections that might be explanatory of a predicted temperature. In this way, based on the PWS location we can associate metrics of human demographics, terrain characteristics, or proportion of visible sky that could be relevant during modelling.

Methodology

The research starts with a literature study to gain more insight about urban temperatures and how low-cost PWS monitor this variable. It is important to identify what relevant variables influencing urban temperatures could be and assess how reliable the PWS measurements are. Then, the focus will be on carrying out a data analysis process using community Python or R libraries that allow testing different machine learning methods (e.g. decision trees, neural networks, support vector machine). The application of these methods often requires a pre-processing stage, in which relevant features are extracted from the selected (geo)data collections and transformed (e.g. scaling, centering, box-cox, PCA), hence enabling the analysis. Ideally, this analysis would be focused on comparing the predictive capabilities of multiple machine learning methods to predict local urban temperatures and also assess the most relevant (geo) variables influencing urban temperatures.

Expected results

From the student we expect a scientific report including literature background and research results. Besides a report we also aim at documentation and reproducibility of code using repositories such as Github.

Important organizational note

This research topic is offered in collaboration with the Royal Netherlands Meteorological Institute (KNMI). Formal supervision will be carried out by an ITC staff member, but daily supervision on content will be together with Dr. Irene Garcia Marti from KNMI. Supervision details will be discussed and agreed on prior to starting the thesis research.

References for further reading