Title: 3D image reconstruction using an improved BEV model and global convolutional attention fusion
Authors: HuaShun Yan; XiaoJie Li; ZeLin Mou
Addresses: Computer Science and Technology, Chengdu University of Information Technology, Chengdu, Sichuan, 610225, China ' Computer Science and Technology, Chengdu University of Information Technology, Chengdu, Sichuan, 610225, China ' Computer Science and Technology, Chengdu University of Information Technology, Chengdu, Sichuan, 610225, China
Abstract: In autonomous driving and computer vision, 3D object detection plays a critical role but faces challenges related to the effective extraction and integration of multi-view features. The existing BEVFormer model, which uses CNNs to convert images into a bird's-eye view (BEV), shows potential but struggles to capture fine-grained details and multi-scale information, especially in high-resolution, complex scenes. To address these limitations, we propose the MultiCAN-DEBEV model, which integrates the MSF-DySample, GCAF, and MSDE modules. These modules improve the handling of multi-scale features, enhance feature expressiveness, and strengthen detail representation. Experiments on the nuScenes dataset show significant performance improvements, and the modular design ensures broad adaptability to other 3D detection models.
Keywords: computer vision; 3D object detection; BEV object detection; autonomous driving.
DOI: 10.1504/IJICT.2025.145403
International Journal of Information and Communication Technology, 2025 Vol.26 No.6, pp.98 - 116
Received: 03 Dec 2024
Accepted: 08 Jan 2025
Published online: 31 Mar 2025 *