Optimizing Quantization for Medical Image Analysis with ESPNetv2 in Resource-Constrained Environments

Dissanayaka, Raveen

dc.contributor.author	Dissanayaka, Raveen
dc.date.accessioned	2026-03-11T05:53:51Z
dc.date.available	2026-03-11T05:53:51Z
dc.date.issued	2025
dc.identifier.citation	Dissanayaka, Raveen (2025) Optimizing Quantization for Medical Image Analysis with ESPNetv2 in Resource-Constrained Environments. Msc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20230397
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2922
dc.description.abstract	Accurate segmentation of cardiac MRI images is essential for clinical diagnostics, but constrictive edge computing resources make it difficult to use deep learning U-Net models because of their large memory and computation requirements. Efficient models such as ESPNetv2 provide a solution for deploying segmentation models to edge devices, but segmentation quality suffers tremendously, making them clinically impractical. This work focuses of performant ESPNetv2 models. I present a quantization-driven enhancement of ESPNetv2 with Quantization Aware Training (QAT) to improve model accuracy while optimizing resource efficiency. The pipeline was developed and evaluated on the MnMs-2 dataset using CPU-only inference to mimic low-resource deployment scenarios. Extensive evaluation on Dice Coefficient, Intersection over Union (IoU), inference latency, parametric size, and RAM constraints alongside segmentation precision were conducted to evaluate model efficacy. The float32 baseline model achieved a Dice score of 0.6862 and an IoU of 0.5725 while having on 0.36 million parameters and being trained on 2D cardiac MRI slices. The model was compact at 1.35 MB and required a peak RAM usage of 1.06 MB during inference. It could process each image in an approximated 0.0337 seconds. After applying Quantization Aware Training (QAT), the model was able to maintain a strong Dice score of 0.6766 (a decrease of 1.4%) and improve the IoU to 0.5949 (an increase of 3.9%) showing that segmentation performance retention despite compression was robust. These results came from a 97.2% reduction in parameter count and 65.3% decrease in RAM. The increase in inference latency by 94.6%, reached 0.0656 seconds per 2D image, remains in acceptable bounds for real-time performance in medical scenarios despite the fact. Directly incorporating quantization compatibility into model design and training demonstrates the practical potential of deploying efficient cardiac segmentation AI models in constrained settings. The architecture proposed, ESPNetv2_QuantLite, illustrates the capability of preserving clinically relevant performance under extreme memory and computational constraints remarkably.	en_US
dc.language.iso	en	en_US
dc.subject	Computing Methodologies	en_US
dc.subject	Machine Learning	en_US
dc.subject	Machine Learning Approaches	en_US
dc.title	Optimizing Quantization for Medical Image Analysis with ESPNetv2 in Resource-Constrained Environments	en_US
dc.type	Thesis	en_US