Abstract:
Infant pose estimation is useful in several medical applications and motion analysis tasks. Due to inherited limitations in this research domain, the research community is actively engaged in exploring different approaches for more accurate pose estimations, while overcoming those limitations. This research aims in developing and enhancing deep learning models by late feature fusion of privacy preserved modalities for more robust two-dimensional (2D) pose estimation of infants. In the literature, various approaches for 2D pose estimation have been explored. Feature fusion of multiple modalities is a novel method within this domain. While early fusion techniques are commonly used in infant pose estimation, this work focuses on the late fusion techniques: addition, concatenation and Iterative Attentional Feature Fusion (iAFF). Privacy-preserved modalities are increasingly utilized in the medical field. Additionally, to mitigate the size limitations of the infant pose estimation datasets, transfer learning on models trained for similar tasks of adult in-bed pose estimation using fine tuning will be performed. The model evaluation showed that multi-modality fusion models have a better performance compared to the models trained on single modalities. The PCKh@1 obtained for depth and pressure single modalities are 97.9% and 98.1% respectively, while the best performing fusion model achieved a PCKh@1 of 99.7%, depicting the positive impact of feature fusion on the infant pose estimation task. A comparison of the performance of the fusion techniques show that the fusion model integrated with the iAFF technique is lightweight and also performs comparably to the fusion model with addition at stage 2, which is the best performing model among the fusion models integrated with addition or concatenation.