Sonic Dereverberation through Coherent to Diffuse Power Ratio Estimators (CDR)
A novel method for speech dereverberation, using Coherent-to-Diffuse Ratio (CDR) estimators, has been presented. This approach is particularly effective in two-microphone arrays, where it quantifies the ratio between direct (coherent) sound energy and diffuse (reverberant/noise) sound energy arriving at the microphones.
How CDR Estimators Work in Two-Microphone Arrays
The method relies on the fact that direct speech arrives with a stable phase relationship between microphones, while reverberation manifests as a spatially diffuse sound field with more random phase differences. The signals at the two microphones are modelled as a coherent direct speech component plus diffuse reverberation/noise. The spatial coherence of the microphone signals is measured, and the CDR is estimated by comparing the measured spatial coherence with the known spatial coherence for purely diffuse field and perfectly coherent signals.
CDR-based Postfiltering for Dereverberation
The estimated CDR is used to design a postfilter that attenuates reverberant (diffuse) components while preserving coherent speech components. The postfilter gain typically depends inversely on the diffuse energy estimate and proportionally on the coherent energy estimate. This gain is applied to the microphone signals or beamformed outputs to suppress reverberation while maintaining intelligibility.
Preprocessing with CDR Estimation
In some systems, CDR estimates guide preprocessing steps such as adaptive beamforming or spectral weighting to improve the signal quality before feature extraction. CDR can inform on when and how much to suppress reverberation dynamically, improving robustness against varying acoustic conditions.
Application in Automatic Speech Recognition (ASR)
Dereverberated signals via CDR postfiltering serve as improved inputs for ASR frontends by increasing the signal-to-reverberation ratio. Enhanced signals lead to better acoustic model matching and thus improved recognition accuracy. The CDR-based approach is computationally efficient, suitable for real-time ASR applications in reverberant environments, especially with small arrays like two microphones.
Summary
In essence, CDR estimators exploit inter-microphone spatial coherence differences between direct sound and reverberation to enable effective postfiltering and preprocessing that directly benefit reverberation-robust ASR performance in two-microphone arrays. The CDR metric is a ratio between the coherent (desired) and diffuse (undesired) signal components, taking values from zero to infinity, with zero indicating high reverberation and infinity indicating only the presence of clean speech. This technique can achieve a high-quality dereverberation result without using a trained model.
References:
- [Article Name]
- [Article Name]
- [Repository Link]
- [Study Name]
In the study, the original recording and its dereverberated version from figure 2 can be heard in the repository. The usage of CDR estimators in ASR systems has been shown to significantly improve the Word Error Rate (WER). This approach can be applied to an array of more microphones by taking pairs and performing an averaging. When the CDR(l,f) estimation is infinitive, G(l,f) takes value 1, and when CDR(l,f) is zero, G(l,f) takes the maximum between G_min and one minus the root square of μ.
The novel method involving data-and-cloud-computing technology, the CDR estimators, is particularly effective in two-microphone arrays for speech dereverberation. This technology works by estimating the Coherent-to-Diffuse Ratio (CDR), which distinguishes direct (coherent) sound energy from diffuse (reverberant/noise) sound energy.
The estimated CDR is not only used for postfiltering to attenuate reverberant components but also guides preprocessing steps like adaptive beamforming or spectral weighting to improve signal quality before feature extraction. This makes the CDR-based approach advantageous in automatic speech recognition (ASR) systems, as it computationally efficient, suitable for real-time ASR applications, and improves recognition accuracy by increasing the signal-to-reverberation ratio.