Separating voice and background music based on 2DFT transform Online publication date: Tue, 18-Mar-2025
by Maoyuan Yin; Li Pan
International Journal of Reasoning-based Intelligent Systems (IJRIS), Vol. 17, No. 1, 2025
Abstract: A new separation method for human voice and background music is designed to address the problems of large positioning errors, large feature extraction errors, and low separation accuracy in existing methods. Firstly, a microphone array is setup in the virtual space to complete signal denoising, and a generalised cross correlation function is introduced to achieve signal localisation. Then, construct a signal time spectrum graph, calculate the position change of signal energy on the frequency axis, and extract components in the sound signal frequency band and time frame. Finally, hamming window function is introduced to improve the 2DFT transform algorithm and build a signal separation model. The test results show that when the proposed method is applied, the localisation error of human voice is only 0.50% when the frame rate of human voice in audio is 1,000 kbps, and the error of background music feature extraction is only 0.05% when the sample audio sampling rate is 60 KHz. The separation accuracy of human voice and background music remains above 95%, with a maximum of nearly 99%. The application effect is good.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Reasoning-based Intelligent Systems (IJRIS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com