Date Added: Sep 2011
Information from video has been used recently to address the issue of scaling ambiguity in convolutive Blind Source Separation (BSS) in the frequency domain, based on statistical modeling of the audio-visual coherence with Gaussian Mixture Models (GMMs) in the feature space. However, outliers in the feature space may greatly degrade the system performance in both training and separation stages. In this paper, a new feature selection scheme is proposed to discard non-stationary features, which improves the robustness of the coherence model and reduces its computational complexity. The scaling parameters obtained by coherence maximization and non-linear interpolation from the selected features are applied to the separated frequency components to mitigate the scaling ambiguity.