Spatio-temporal Data-driven Modeling and eXplainable Artificial Intelligence for Early-season Yield Prediction from Satellite Imagery Data

GIMA
M-GEO
M-SE
STAMP
M-SE Core knowledge areas
Spatial Information Science (SIS)
Additional Remarks

A good command of Python programming is essential for successfully implementing this research.

Staff Involved: Mahdi Farnaghi, Raul Zurita Milla

Topic description

The accurate prediction of crop yields is a critical element in ensuring global food security, managing agricultural markets and supporting the decision-making processes of farmers and policymakers. Traditional yield prediction methods, which usually rely on ground surveys and historical data, can be time-consuming and may not reflect current environmental conditions accurately. The emergence of satellite imagery and advanced computational techniques has opened new possibilities for early-season yield prediction, which is vital for proactive agricultural management.

This research aims to harness the potential of spatio-temporal data-driven modelling and eXplainable Artificial Intelligence (XAI) for predicting crop yields using satellite imagery. The goal is to develop models that not only provide accurate yield predictions but also shed light on the influencing factors and the internal decision-making processes of machine learning (ML) and deep learning (DL) algorithms. By doing so, we intend to make AI decisions more transparent and understandable.

Topic objectives and methodology

Objectives

This research aims to bridge this gap by using XAI methods to enhance the transparency and trustworthiness of spatio-temporal DL models used for yield predictions. The goal is to develop a model that accurately predicts crop yields while providing intuitive explanations of its predictions, aligning with the users' understanding and expectations. The idea is to use such explanations and understandings of the model to enhance the modelling procedure and facilitate early-season yield predictions.

Some of the objectives that can be considered for this study are as follows:

  1. To develop a spatio-temporal model based on ML (RF, SVM, etc.) or DL (e.g., RNN, CNN, Transformers, etc.) for crop yield prediction from satellite imagery.
  2. To evaluate the accuracy and reliability of the model in comparison to existing methods.
  3. To exploit XAI to provide insight into how the model predicts the yield from the input time series and compares it with the phenological knowledge about the crop life cycle.
  4. To explore the earliest possible period in the season that provides an accurate and unbiased yield estimation accuracy.
  5. To utilise XAI methods to improve the model’s prediction by fixing unexpected bias.
  6. To investigate how the model can be generalised to other regions and periods.

Data

The study will utilise a range of Earth observation data. The data will encompass various spectral bands and temporal resolutions to capture the dynamic nature of crop growth and environmental conditions. Building on two prior MSc projects, this research will access and further develop the datasets produced in those studies.

In addition, the following dataset could be incorporated in the modelling process:

Methodology

Preprocessing: Data preprocessing will involve normalisation, augmentation, and handling of missing data. Temporal data alignment and noise reduction techniques will be applied to ensure data quality.

Model Development: The model will employ ML and DL algorithms, RF, CNN, RNNs, Transformers, or hybrid methods to capture spatial and temporal dependencies in the data.

Integration of XAI: XAI techniques will make the model's decisions explainable. Selection of the XAI model will depend on the chosen ML/DL algorithm and the research needs. A comparison of XAI methods is needed to determine the suitable method.

Validation and Testing: The model will be validated using a subset of the data, and its performance will be compared against the two developed yield prediction models in the previous master thesis in ITC. Metrics such as Mean Absolute Error (MAE) and R-squared will be used to assess accuracy.

References for further reading

[1]          van Klompenburg, T., Kassahun, A., and Catal, C. (2020). Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture, 177, 105709.

[2]          Oikonomidis, A., Catal, C., and Kassahun, A. (2023). Deep learning for crop yield prediction: a systematic literature review. New Zealand Journal of Crop and Horticultural Science, 51(1), 1-26.

[3]          Hall, O., Ohlsson, M., and Rögnvaldsson, T. (2022). A review of explainable AI in the satellite data, deep machine learning, and human poverty domain. Patterns, 3(10), 100600.

[4]          Masrur, A., and et al. (2021). Interpretable machine learning for analysing heterogeneous drivers of geographic events in space-time. International Journal of Geographical Information Science, 1.