https://doi.org/10.1051/epjconf/202532801031
Improved Audio Separation Using U-Net and ICA
Indira Gandhi Delhi Technical University for Women, India
* Corresponding author: vagisha021mtcse23@igdtuw.ac.in
Published online: 18 June 2025
This paper introduces UNetICA, an innovative hybrid model for audio source separation that integrates the strengths of U-Net and Independent Component Analysis (ICA). The model is designed to effectively isolate individual audio sources such as vocals, drums, bass, and other instruments from mixed music tracks. Initially, the U-Net architecture is employed to process spectrograms, extracting multi-scale features and generating coarse estimates of each source. These preliminary outputs are then refined through ICA, which enhances separation by leveraging the statistical independence of audio components. This two-stage approach allows UNetICA to address both spectral structure and statistical properties of sources, resulting in more accurate separation. The model was trained and evaluated on the MUSDB18 dataset, which includes 100 tracks for training and 50 for testing. Performance was measured using Signal-to-Distortion Ratio (SDR). UNetICA demonstrated superior results, achieving an SDR of 19.05 dB for bass, significantly outperforming existing models. Vocals and other sources also showed competitive SDRs of 8.792 dB and 8.868 dB, respectively. When compared with state-of-the-art models such as Open-Unmix, Demucs, and Conv-Tasnet, UNetICA consistently achieved better separation performance, validating the effectiveness of the proposed hybrid framework.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.