Article: Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network Journal: International Journal of Computational Vision and Robotics (IJCVR) 2024 Vol.14 No.5 pp.510 - 528 Abstract: CNN-based scene text detection in real-world applications is facing two major issues. The speed-accuracy trade-off is the first issue. Secondly, the arbitrary-shaped text instance is to be modelled. This work solves both issues by using ResNet-152 backbone-based pixel aggregation network. Since ResNet-152 provides better accuracy and performance, ResNet-152 is chosen for backbone. The proposed network has a high speed segmentation head and a learnable post-processing. Feature pyramid enhancement module (FPEM) and feature fusion module (FFM) constitute the segmentation head. For high quality segmentation, multi-level information is introduced by a cascadable U-shaped module that is nothing but FPEM. Different depth features are given by FPEM. FFM will collect these features into a final feature to segment the arbitrary shaped text. Using the predicted similar vectors aggregate precisely text pixels, pixel aggregation (PA) implements this post process which is learnable. The proposed ResNet-152 backbone-based PAN can attain an F-measure of 85.6% on Total-Text dataset. Inderscience Publishers - linking academia, business and industry through research

Title: Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network

Authors: Suresh Shanmugasundaram; Natarajan Palaniappan

Addresses: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore – 632014, India ' School of Computer Science and Engineering, Vellore Institute of Technology, Vellore – 632014, India

Abstract: CNN-based scene text detection in real-world applications is facing two major issues. The speed-accuracy trade-off is the first issue. Secondly, the arbitrary-shaped text instance is to be modelled. This work solves both issues by using ResNet-152 backbone-based pixel aggregation network. Since ResNet-152 provides better accuracy and performance, ResNet-152 is chosen for backbone. The proposed network has a high speed segmentation head and a learnable post-processing. Feature pyramid enhancement module (FPEM) and feature fusion module (FFM) constitute the segmentation head. For high quality segmentation, multi-level information is introduced by a cascadable U-shaped module that is nothing but FPEM. Different depth features are given by FPEM. FFM will collect these features into a final feature to segment the arbitrary shaped text. Using the predicted similar vectors aggregate precisely text pixels, pixel aggregation (PA) implements this post process which is learnable. The proposed ResNet-152 backbone-based PAN can attain an F-measure of 85.6% on Total-Text dataset.

Keywords: arbitrary-shaped text detection; scene text detection; curve text detection; text segmentation; DNN.

DOI: 10.1504/IJCVR.2024.140820

International Journal of Computational Vision and Robotics, 2024 Vol.14 No.5, pp.510 - 528

Received: 11 Oct 2021
Accepted: 12 Aug 2022
Published online: 03 Sep 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network

Keep up-to-date