ProjectsProject Details

Temporally Consistent Monocular Depth Estimation for Video

Project ID: 10380-2-25
Year: 2026
Student/s: Noam Murciano and Yuri Minin
Supervisor/s: Dr. Meir Bar-Zohar

Monocular Depth Estimation (MDE) is a fundamental task in computer vision. When applied to video sequences, however, it suffers from temporal flickering because modern models such as Depth Anything V2 produce relative depth maps independently for each frame. In this project, we developed a temporally consistent pipeline that integrates three main stages: (a) Using Depth Anything V2 as the backbone for high-quality depth estimation. (b) 3D point-cloud alignment via the Kabsch–Umeyama algorithm, with static anchors selected using optical flow (RAFT). (c)  Semantic filtering YOLOv8 to separate dynamic objects from the static background. Additionally, an alternative temporal fusion approach was explored to further smooth depth over time. The results demonstrate a significant improvement in temporal stability while preserving the sharpness and quality of the base model. The project shows how a modern MDE model can be transformed into a stable, practical, and hardware-independent video depth estimation system without requiring dense ground-truth data.

Poster for Temporally Consistent Monocular Depth Estimation for Video