Towards a GeoAI: Enriching Large Language Models with geospatial knowledge

GIMA
M-GEO
M-SE
STAMP
Staff Involved
M-SE Core knowledge areas
Spatial Information Science (SIS)
Spatial Planning for Governance (SPG)
Additional Remarks

Image created on craiyon.com with prompt "geographic artificial intelligence". 

Topic description

Although large language models (LLM) have been around for a while, the public release of ChatGPT demonstrated a quantum leap in the abilities of these models to respond to natural language queries and engage in a conversation. The similarity of a chat with a new generation LLM to that with a human easily masks the fact that processes under the hood are fundamentally different from human cognition and knowledge, and that a substantial input of human training and supervision was (and is) needed to finetune the models.

Similarly, the way geographic knowledge and geographic information processing practices are embedded in the model differ fundamentally from established ways. Current LLMs like ChatGPT have little to no inherent geospatial knowledge or internalized GIScience practices. This makes them unreliable sources for answers to all but the most straightforward geospatial questions or analysis workflows, although initial explorations of using LLMs for geospatial analysis tasks show sometimes astonishing results. Several possible remedies exist, but few have been systematically tried yet.

Thus, the topic of representing geographic knowledge and geospatial practices in LLMs is about as cutting-edge as it can be. The challenge is further complicated by the fact that all LLMs are inherent black boxes which are further purposefully black-boxed by not publishing training and tuning processes.

This thesis aims to test simple geospatial analysis workflows with LLMs and explore approaches to enrich them with geospatial knowledge and practices to improve accuracy and consistency of responses. In contrast to other topics, this topic is much less pre-defined, and the exact definition of the research question is left to the student.

In consequence, tackling this topic has significantly higher demands and risks than a typical topic. A student choosing this topic should have/be:

  • highly independent and self-motivated
  • creative and critical mindset
  • willingness and opportunity to invest more time than the ECTS required, if needed
  • strong ability for conceptualizing a problem and then quickly translate it into practice
  • experience with or willingness to learn about knowledge representations, e.g., knowledge graphs
  • solid knowledge of quantitative (geospatial) analysis methods
  • ability to automate processes via scripts (Python, R) and work with APIs
References for further reading

https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots

Ding, Ruixue, Boli Chen, Pengjun Xie, Fei Huang, Xin Li, Qiang Zhang, and Yao Xu. “MGeo: Multi-Modal Geographic Pre-Training Method,” May 24, 2023. https://doi.org/10.1145/3539618.3591728

Li, Zhenlong, and Huan Ning. “Autonomous GIS: The next-Generation AI-Powered GIS.” arXiv, May 29, 2023. https://doi.org/10.48550/arXiv.2305.06453

Mai, Gengchen, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, et al. “On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence.” arXiv, April 13, 2023. https://doi.org/10.48550/arXiv.2304.06798

Roberts, Jonathan, Timo Lüddecke, Sowmen Das, Kai Han, and Samuel Albanie. “GPT4GEO: How a Language Model Sees the World’s Geography.” arXiv, May 30, 2023. https://doi.org/10.48550/arXiv.2306.00020

Scheider, Simon, and Kai-Florian Richter. “Pragmatic GeoAI: Geographic Information as Externalized Practice.” KI - Künstliche Intelligenz 37, no. 1 (March 2023): 17–31. https://doi.org/10.1007/s13218-022-00794-2