Indoor scene segmentation and depth estimation using deep learning

M-GEO
ACQUAL
Staff Involved
Additional Remarks

Programming skill is also mandatory (preferably Python), experience with deep learning frameworks is highly preferred (PyTorch, Keras and Tensorflow).
Mr. Ning Zhang (Advisor)

Topic description

Indoor scene understanding is important for tasks like indoor navigation and semantic mapping. Input only RGB images, state-of-the-art methods using Convolutional Neural Networks (CNNs) can parse the scenes and estimate depth maps in a unified framework. Both the semantic segmentation maps and depth maps are further utilized to solve many practical problems. Although existing deep learning methods can achieve high accuracy, the speed of the algorithm is not so fast.
The aim of this MSc topic is to design and implement an efficient deep network to segment indoor scenes as well as estimate depth maps from RGB images.

Topic objectives and methodology

The student will initially revise the existing literature on scene segmentation and depth estimation with deep learning. Student will focus on the technologies can be used to train an efficient end-to-end network.
The network only takes RGB images as input, and output pixel-wise segmentation maps and depth maps. So, it will be a multi-task learning network. Any indoor scene dataset can be used to train the network and the student can use supervised or semi-supervised methods to solve this problem. The main challenge of this project will be efficiently extending existing approach to fuse features from different levels of the network and make the final prediction. Tests will be conducted acquiring videos of indoor environments and testing the algorithm on these datasets. An additional challenge will be to develop solutions that can be sufficiently efficient to run in (near) real time on an edge device.

References for further reading

[1] Eigen, David, and Rob Fergus. "Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture." Proceedings of the IEEE international conference on computer vision. 2015.
[2] Nekrasov, Vladimir, et al. "Real-time joint semantic segmentation and depth estimation using asymmetric annotations." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019.