Title: Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network
Authors: Suresh Shanmugasundaram; Natarajan Palaniappan
Addresses: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore – 632014, India ' School of Computer Science and Engineering, Vellore Institute of Technology, Vellore – 632014, India
Abstract: CNN-based scene text detection in real-world applications is facing two major issues. The speed-accuracy trade-off is the first issue. Secondly, the arbitrary-shaped text instance is to be modelled. This work solves both issues by using ResNet-152 backbone-based pixel aggregation network. Since ResNet-152 provides better accuracy and performance, ResNet-152 is chosen for backbone. The proposed network has a high speed segmentation head and a learnable post-processing. Feature pyramid enhancement module (FPEM) and feature fusion module (FFM) constitute the segmentation head. For high quality segmentation, multi-level information is introduced by a cascadable U-shaped module that is nothing but FPEM. Different depth features are given by FPEM. FFM will collect these features into a final feature to segment the arbitrary shaped text. Using the predicted similar vectors aggregate precisely text pixels, pixel aggregation (PA) implements this post process which is learnable. The proposed ResNet-152 backbone-based PAN can attain an F-measure of 85.6% on Total-Text dataset.
Keywords: arbitrary-shaped text detection; scene text detection; curve text detection; text segmentation; DNN.
DOI: 10.1504/IJCVR.2024.140820
International Journal of Computational Vision and Robotics, 2024 Vol.14 No.5, pp.510 - 528
Received: 11 Oct 2021
Accepted: 12 Aug 2022
Published online: 03 Sep 2024 *