Title: Multi-task deep learning approach for sound event recognition and tracking

Authors: Tzung-Shi Chen; Ming-Ju Chen; Tzung-Cheng Chen

Addresses: Department of Computer Science and Information Engineering, National University of Tainan, Tainan 700301, Taiwan ' Department of Computer Science and Information Engineering, National University of Tainan, Tainan 700301, Taiwan ' Department of Aerospace and Systems Engineering, Feng Chia University, Taichung 407802, Taiwan

Abstract: In smart cities, it is important to detect abnormal activities through cameras. However, cameras have limitations such as blind spots and blocked areas that can result in detection failures. Sound, on the other hand, is less likely to be obstructed. This paper proposes using microphone arrays to identify sound events, predict their locations, and track their trajectories using multi-task deep learning approaches. Experimental results show high predictive accuracy. Finally, the proposed models are also converted to quantised versions and deployed on embedded devices in vehicles to analyse memory footprint and execution time.

Keywords: deep learning; microphone arrays; sound event classification; sound tracking; localisation.

DOI: 10.1504/IJAHUC.2024.138747

International Journal of Ad Hoc and Ubiquitous Computing, 2024 Vol.46 No.2, pp.104 - 121

Received: 05 Dec 2023
Accepted: 20 Feb 2024

Published online: 29 May 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article