| dc.description.abstract |
3D object reconstruction from 2D images is a long-standing challenge in computer vision due to the loss of depth information, occlusions, and the geometric ambiguity present in single-view inputs. Traditional reconstruction techniques rely on multi-camera setups, controlled lighting, or depth sensors, making them impractical for everyday users or lightweight applications. Modern deep learning approaches have improved reconstruction accuracy but often require high computational resources, lack intelligent multi-view reasoning, or produce low-resolution voxel outputs with visible artifacts. To address these limitations, this research presents an enhanced multi-view 3D reconstruction system based on an improved Pix2Vox architecture capable of producing accurate voxel models using only three orthographic 2D views: front, side, and top.
The proposed system integrates several key innovations to improve reconstruction quality and accessibility. Shared MobileNetV2 encoders are used to extract consistent features across views while maintaining computational efficiency. A novel attention-based fusion mechanism adaptively weights each view based on its geometric contribution, allowing the network to focus on the most informative perspectives. A progressive decoding pipeline then transforms fused features into a 32³ voxel representation through hierarchical upsampling. A specialized 3D refinement network further enhances surface continuity and reduces voxel-level artifacts. The model is trained on the ModelNet10 dataset using a multi-objective loss function that combines voxel accuracy, surface consistency, and volume preservation.
The complete system is deployed through a user-friendly platform built with Flask and Next.js, enabling real-time inference and interactive 3D visualization via Three.js. Experimental results demonstrate that the enhanced architecture provides improved reconstruction accuracy, faster inference, and higher-quality voxel outputs compared to baseline approaches, making 3D reconstruction more accessible for educational, creative, and research applications. |
en_US |