Article: A light end-to-end comprehensive attention architecture for advanced face parsing Journal: International Journal of Information and Communication Technology (IJICT) 2025 Vol.26 No.3 pp.89 - 109 Abstract: Face parsing involves segmenting a face into various semantic regions, but challenges such as complex structures, varying poses, and occlusions make achieving high performance difficult, even for state-of-the-art methods. To address these challenges, we propose FP-Transformer, a novel architecture that combines CNNs and Transformer to extract both long-range and short-range features. Our design includes: 1) a U-shaped Encoder-Decoder with hierarchical feature fusion and hybrid attention blocks for semantic learning; 2) convolution-based patch embedding and merging to retain edge information; 3) a novel Bunch-layer normalisation (BLN) to maintain consistent normalisation across patches. Experiments on CelebAMask-HQ and LaPa datasets demonstrate the effectiveness of our approach, achieving mean F1 scores of 87.1% and 92.6%, respectively. Our model performs robustly even under occlusions, extreme poses, and complex backgrounds. Inderscience Publishers - linking academia, business and industry through research

Title: A light end-to-end comprehensive attention architecture for advanced face parsing

Authors: Cong Han; Peng Cheng; Zhisheng You

Addresses: College of Computer Science, Sichuan University, Chengdu, Sichuan Province, 610065, China ' School of Aeronautics and Astronautics, Sichuan University, Chengdu, Sichuan Province, 610065, China ' College of Computer Science, Sichuan University, Chengdu, Sichuan Province, 610065, China

Abstract: Face parsing involves segmenting a face into various semantic regions, but challenges such as complex structures, varying poses, and occlusions make achieving high performance difficult, even for state-of-the-art methods. To address these challenges, we propose FP-Transformer, a novel architecture that combines CNNs and Transformer to extract both long-range and short-range features. Our design includes: 1) a U-shaped Encoder-Decoder with hierarchical feature fusion and hybrid attention blocks for semantic learning; 2) convolution-based patch embedding and merging to retain edge information; 3) a novel Bunch-layer normalisation (BLN) to maintain consistent normalisation across patches. Experiments on CelebAMask-HQ and LaPa datasets demonstrate the effectiveness of our approach, achieving mean F1 scores of 87.1% and 92.6%, respectively. Our model performs robustly even under occlusions, extreme poses, and complex backgrounds.

Keywords: face parsing; face analysation; face segmentation; self-attention mechanism.

DOI: 10.1504/IJICT.2025.144461

International Journal of Information and Communication Technology, 2025 Vol.26 No.3, pp.89 - 109

Received: 29 Oct 2024
Accepted: 19 Dec 2024
Published online: 13 Feb 2025 *

Title: A light end-to-end comprehensive attention architecture for advanced face parsing

Keep up-to-date