Deep learning techniques to optimise and improve 3D scene graphs

M-GEO

Robotics

Staff Involved

Additional Remarks

Students should have strong programming skills in Python and C++. Experience with deep learning frameworks is highly preferred (PyTorch OR Tensorflow).

Topic description

The problem of 3D spatial perception involves the real-time construction and maintenance of a comprehensive and actionable representation of the environment using sensor data and prior knowledge. While advancements have been made in robot perception, existing methods primarily focus on purely geometric maps (such as traditional Simultaneous Localization and Mapping - SLAM) or "flat" metric-semantic maps that struggle to handle large environments or extensive semantic labelling. Hierarchical representations offer efficiency in terms of storage and result in layered graphs. These characteristics enable efficient inference procedures, providing provable computational efficiency.

Topic objectives and methodology

We designed an algorithm based on the Hydra framework called mono-hydra to work with monocular image inputs to build a 3d hierarchical representation called "3D scene graph". This algorithm works in a real-time setting, so the generated mesh's quality is compromised. This thesis proposal aims to improve such a representation based on novel machine-learning approaches

References for further reading

Rosinol, A., Gupta, A., et al. (2020) 3D dynamic scene graphs: Actionable spatial perception with places, objects, and humans, arXiv.org. Available at: https://doi.org/10.48550/arXiv.2002.06289
Wu, C.-Y. et al. (2023) Multiview compressive coding for 3D reconstruction, arXiv.org. Available at: https://doi.org/10.48550/arXiv.2301.08247
Pavllo, D. et al. (2021) Learning generative models of textured 3D meshes from real-world images, arXiv.org. Available at: https://arxiv.org/abs/2103.15627