Title: Separating voice and background music based on 2DFT transform

Authors: Maoyuan Yin; Li Pan

Addresses: School of Music and Dance, Mudanjiang Normal University, Mudanjiang, 157011, China ' School of Music and Dance, Mudanjiang Normal University, Mudanjiang, 157011, China

Abstract: A new separation method for human voice and background music is designed to address the problems of large positioning errors, large feature extraction errors, and low separation accuracy in existing methods. Firstly, a microphone array is setup in the virtual space to complete signal denoising, and a generalised cross correlation function is introduced to achieve signal localisation. Then, construct a signal time spectrum graph, calculate the position change of signal energy on the frequency axis, and extract components in the sound signal frequency band and time frame. Finally, hamming window function is introduced to improve the 2DFT transform algorithm and build a signal separation model. The test results show that when the proposed method is applied, the localisation error of human voice is only 0.50% when the frame rate of human voice in audio is 1,000 kbps, and the error of background music feature extraction is only 0.05% when the sample audio sampling rate is 60 KHz. The separation accuracy of human voice and background music remains above 95%, with a maximum of nearly 99%. The application effect is good.

Keywords: 2DFT; voice; background music; separation; generalised cross correlation function; microphone array; separation model.

DOI: 10.1504/IJRIS.2025.145050

International Journal of Reasoning-based Intelligent Systems, 2025 Vol.17 No.1, pp.50 - 57

Received: 10 Mar 2023
Accepted: 27 Apr 2023

Published online: 18 Mar 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article