General discussion - 低頻聽覺訊息對人工電子耳使用者在噪音環境之影響

4. Discussion

4.4 General discussion

Overall, the present study points out a glimpsing mechanism that contributes to the Bimodal benefit in noisy enviornment. The benefit introduced by glimpsing raises the following question:

Which underlying cues or components present in the low frequency acoustic information contribute to or promote glimpsing? In other words, what is so special about the acoustic information in low frequency region that enables simulation Bimodal users to improve the speech intelligiblity in noisy environment?

The answer to both questions and major factor suggested that is the low frequency SNR advantage. When speech is interferers with noise, at least during voiced speech segments, such as vowels, semivowels, a lesser degree of masking in the low frequency region compared to the high frequency region. The low frequency region contains salient speech energy with a dominant peak near 500 Hz, this would shielded to a certain extent from distortion and masker interferes, because of the low frequency dominance of the long-term speech spectrum [143], in particular, this property of the long-term spectrum of speech that is common across 12 different languages [144]. Li et al.

found that, when prior to mixing the target and masker signal at SNR 0 dB, there has some difference in magnitude spectra of them [81]. Although the long-term RMS SNR measured across the whole sentence of the signal is 0 dB, the target‟s magnitude spectra is stronger than the masker in the low frequencies 500 Hz but not in the high frequencies, at least during voiced segment.

Therefore, it is rationale to hypothesis that the SNR in the low frequency region will be, on average, larger than the SNR in the high frequencies, and then a low frequency SNR advantage would observed.

There are several reasons for the critical of the low frequency SNR advantage. First, it as a useful cue provides to access to a better F0 and F1 representation, and second, it provides a better ability of glimpsing as the target will be stronger than the masker in the low frequency region. The harmonics in speech signal will falling in the low frequencies region, and will be affected less than the high frequency harmonics, thus listeners will have access to resilience F0 cues. Listeners will also have access to resilience F1 information which was important for vowel and stop-consonant identification. Parikh et al. found that via acoustic analysis, F1 is preserved to a certain degree in noise even at extremely low SNR levels (-5 dB) [145]. Based on the analysis of acoustic with a large vowel database, F1 was identified reliably 60% of the time, whereas F2 was identified only 30% of the time when the vowels were embedded with multi-talker babble in SNR -5 dB. F1 information is critical not only for perceived vowel but also for perceived stop consonant as it deliver voicing information. For example, the F1 onset time following the release of prevocalic stops has known to be one of the major cues to stop voiced and unvoiced segment distinction [146].

In summary, several phonetic cues which contain in the Bimodal stimuli, that supply listeners to better glimpse the target and we suggested that the underlying mechanism responsible for the Bimodal benefit is “glimpsing”. Glimpsing has two processing stages, first, detection of the target from the dip in fluctuating and followed by synthesized those cues which contained in the glimpsed information. The detection process is promoted by a favorable SNR, which in the time that the target is stronger than the masker, are easier to detect than regions with negative SNR. As stated above, due to the low frequency region has a more favorable SNR, it would be advantageous to glimpse the target in the low frequencies than in the high frequencies. The effective SNR in the

low frequencies is reasonable to hypothesized may improve the performance of speech intelligibility, at least for the completed non-vocoded low frequency acoustic information in the Bimodal stimuli. The detection processing yields the glimpsed information, which would patched up in the second stage, namely the integration stage. The integration stage involves higher level central auditory processing. There are multiple cues may likely involved in the integration stage, such as F0, F1, onset cues, voicing cues, and other auditory grouping cues [147]. We argue the possibility that those CI users used F0 cues in the present study to segregate the target from the masker. However, Li et al. found listeners were able to glimpse the target among of 20 multi-talkers, suggesting that cues other than F0 may be utilized in the integration process and concluded that F0 cues are not always necessary depending on the task [148]. The favorable low frequency SNR was presented in both vocoder and Bimodal stimuli. However, the Bimodal stimuli were perceived more accurately than the vocoder stimuli. That is make believe, due to the low pass speech promoted better and maybe more salient integration of the glimpses detected in the low acoustic frequency regions with the information contained in the high frequency vocoded regions. Furthermore, we could suggest that the glimpses in the low frequency acoustic information provided cues about the target that was unavailable from the vocoder stimulus. The information about the target extracted from the low frequency acoustic region was subsequently synthesized with the vocoder stimuli to yield higher speech intelligibility than those obtained with the vocoder stimuli alone.

Despite the advantages of Bimodal stimulation mentioned above, some challenges will be confront, in fact, in the combination of a CI and a HA. Between these two types of auditory stimulation, some crucial differences will lead to discrepancies between the two ears. A critical difference between electric and acoustic stimulation concerns the frequency range that is covered by CI and HA. CIs usually do not cover frequencies lower than 100-200 Hz, but can convey signals up to about 8000 Hz. On the other hand, HA often only amplified low frequencies below 1 kHz, since

this is where residual hearing generally remains in HI listeners. Even part of frequency region would overlap in electric and acoustic signal; however, they result in different percepts. McDermott et al. reported that acoustic pure tones are perceived as very different from electric pulse trains delivered to a single electrode position with a constant rate [149]. This may in part be due to a mismatch in frequency-to-place mapping in CIs. Another important issue was the synchronized of the signals, when combined a CI and a HA. The two devices both present the signal to the listener with a small delay. This delay, however, is different for acoustic and electric stimulation. The CI device introduces a device-dependent delay of approximately 1-20 ms. On the other hand, the HA introduces a device-dependent delay of about 1-12 ms. In addition, the acoustic signal has a frequency delay of 1-4 ms introduced by the middle and the inner ear [9]. This difference in delay may impair speech perception since the signals cannot be processed simultaneously. In current study, there is evidence; the squelch effect was not observed in Bimodal stimulation, indicates that the ITD and ILD was disorder and confuse. In addition, the synchronization mismatch may also be detrimental to the localization and segregation of sound sources which in part depend on time differences between the ears. Finally, Bimodal stimulation is related to the perception of loudness.

In current HAs and CIs both contain automatic gain control (AGC) that reduced the dynamic range of the signal. The two systems have different parameter settings and are fitted independently. Since the AGC rely on different degrees of compression, the dynamic ranges was also different of the HA and CI. This often leads to unbalanced loudness across the ears [150]. Furthermore, loudness recruitment associated with hearing impairment may also affect the balance of loudness. As stated before, the dynamic range of people with loudness recruitment is reduced. When the level of a sound is increased above the absolute threshold, the growth of loudness as a function of sound level is larger than for normal hearing levels. If there is a mismatch in loudness growth across different frequencies in residual hearing and the different electrodes in electric hearing, the balance of

loudness across the ears may be further impaired. This mismatch in loudness may also impair localization and segregation of sound sources, since this in part depends on loudness differences across the ears.

While the current study found that F0 can largely account for the benefit of preserving residual hearing, it addresses a theoretical issue in Bimodal stimulation. However, it is no way suggests that F0 extraction be the approach of signal processing for improving CI speech intelligibility in noise. It is worth to note that all previously mentioned F0 studies used offline signal processing under laboratory conditions. In real-world listening situations, accurate real-time F0 extraction in the presence of background noise is technically challenging, and may not be feasible.

At least, to our knowledge, it has not yet been found.

在文檔中低頻聽覺訊息對人工電子耳使用者在噪音環境之影響 (頁 64-69)