Automated OCT B-Scan Classification of Macular Scan Volumes with DINOv2 and Multi-Head Attention
| Type of publication: | Article |
| Citation: | |
| Journal: | Investigative Ophthalmology & Visual Science June 2025, Vol.66, 5440. |
| Volume: | vol. 66 |
| Year: | 2025 |
| Month: | June |
| Abstract: | Purpose : Automatic OCT image classification is an important aspect of early diagnosis of retinal diseases. Many previous methods rely on a single central B-scan for classification, which may not capture all the relevant diagnostic information available in the rest of the B-scans of the OCT volume. This work investigates the application of the state-of-the-art DINOv2-base foundational model in classifying retinal diseases by aggregating information from multiple B-scans in each volume. Methods : We used a dataset of over 16,000 retinal OCT B-scans from 441 patients, acquired at Noor Eye Hospital, Tehran, Iran. The images were labeled by a retinal specialist and organized into three conditions: Choroidal neovascularization (CNV), Drusen, and Normal. The dataset was divided into training, validation, and test sets with an approximate 70:15:15 ratio. The visual transformer DINOv2-base was used for model development with all layers unfrozen. A multi-head self-attention mechanism was used to aggregate information across all B-scans for each patient. Early stopping, learning rate scheduling, and dropout were employed to mitigate overfitting. This attention-based approach allowed the model to focus on the informative regions across the multiple images, enhancing its ability to make accurate patient-level diagnostic classifications. The fully connected layer with dropout takes the aggregated features and produces the final prediction of the disease class. The evaluation approach utilized scikit-learn metrics for performance assessment. Results : On the test set, the model achieved an accuracy of 0.8939, with a precision of 0.9030, recall of 0.8939, and F1-score of 0.8943. The class-wise performance revealed that the model achieved a precision of 1.00, recall of 0.88, and F1-score of 0.93 for the Normal class; a precision of 0.87, recall of 0.83, and F1-score of 0.85 for the Drusen class; and a precision of 0.82, recall of 1.00, and F1-score of 0.90 for the CNV class. Conclusions : Our results confirm the efficiency of using DINOv2 and multi-head attention for OCT image classification that allows the analysis of entire volumes by aggregating information from multiple B-scans. This work highlights the potential of advanced transformer-based models with attention mechanisms for ophthalmic image analysis tasks. Further research will be focused on validating the model's performance on larger and more diverse datasets. This abstract was presented at the 2025 ARVO Annual Meeting, held in Salt Lake City, Utah, May 4-8, 2025. |
| Keywords: | |
| Authors | |
| Added by: | [] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|
|
Topics
|
|
|
|
|
