
The goal of this work was to analyze images that were extracted from flexible laryngoscopy videos using classical image processing methods, in order to improve the diagnostic accuracy in vocal fold examinations.
Flexible laryngoscopy is a vital tool used by ENT (ear, nose, and throat) specialists to diagnose disorders and diseases of the vocal folds. However, interpreting the resulting videos is not a simple task. Unlike static medical images (such as X-rays), laryngoscopy videos present unstable visual conditions: Light reflections from the moist tissues, noise due to the camera quality, and rapidly changing tissue appearances, caused by motion and deformation of the vocal folds. The vocal folds themselves may only be visible intermittently, so any computational solution to this problem must consider both spatial and temporal information throughout the video.
As part of this work, we examine whether observed deformations in the vocal fold tissue result from camera angles or indicate true anatomical anomalies. Our contribution to the project is the development of a system that can accurately segment the vocal folds from the surrounding tissues in each frame — forming the foundation for further stages of the project.