Project DetailsManual interpretation of fetal ultrasound images is time-consuming and highly dependent on the skill level of the physician or technician performing the examination. As part of this academic project, we built a deep learning architecture aimed at automatically identifying and classifying ultrasound images into six different anatomical scan planes (head, abdomen, thorax, femur, cervix, and other). The goal was to present a solution capable of improving the consistency and efficiency of the interpretation process.
We started the project with a basic CNN model that served as a baseline, achieving an accuracy of 72%. To address data imbalance challenges and improve recognition, we upgraded the code and, after experimenting with several different models, transitioned to a modern Attention-based Swin Transformer architecture.
We incorporated several practical techniques into the code notebook: adding black borders (Letterboxing) to maintain the fetus's proportions without distorting the image, training at increasing resolutions (Progressive Resizing), and applying advanced augmentations (such as MixUp and CutMix) to diversify the data and prevent model overfitting.
To ensure the model learns to identify true anatomy rather than memorizing features of a specific patient, we structured the training split so that images of the same patient would never be divided between training and validation (Patient-level Group K-Fold cross-validation). Through this process, we created an ensemble of 5 models and used an EMA mechanism to smooth their weights.
Our algorithm was evaluated on a separate Test Set containing 5,271 images, utilizing TTA (Test-Time Augmentation-flipping the image during testing to average the predictions). The solution achieved a final accuracy of 95.45%. This result demonstrates a tremendous improvement over the baseline model, proving that the proper utilization of the attention mechanism, combined with advanced computer vision techniques and prior domain knowledge, can lead to efficient and reliable classification of ultrasound images, yielding results that match or even exceed the international state-of-the-art.
