HQ and JUD for Distributed Speech Recognition

4.4 Histogram-Shift Compensation

4.5.2 HQ and JUD for Distributed Speech Recognition

Here we consider a complete DSR system based on the proposed HQ approaches.

HQ was ﬁrst applied at the client end to quantize and compress the input speech features.

The quantized codewords were then transmitted to the server. JUD was then applied at the server to improve accuracies.

Conventionally, in DSR this is done using SVQ [17]. If noise can be properly handled to a good degree by cascading an HEQ process at the front, we can also

compen-Figure 4.1: Performance improvements obtained by the various JUD approaches as com-pared to HQ alone: (a) averaged over all SNR values but separated for diﬀerent noise types in sets A, B, and C; (b) averaged over all noise types but separated for each SNR value;

and (c) averaged over all SNR values and noise types but separated into sets A, B, and C.

sate for quantization errors caused by SVQ using some conventional approaches associated with SVQ, for example the well-known Extended Cluster Information Vector Quantization (ECIVQ) [16]. Therefore we need to compare the proposed HQ followed by JUD with such conventional approaches associated with SVQ ﬁrst. The results are in Fig. 4.2(a), (b), and (c). The six bars in each set in Fig. 4.2 are respectively for SVQ alone, ECIVQ alone, the cascade of HEQ front-end and SVQ (HEQ-SVQ), the cascade of HEQ front-end and ECIVQ (HEQ-ECIVQ), HQ (two-dimensional), and the same HQ with complete JUD in-cluding histogram shift (HQ-s,n,q), all with bit rates 4.4 kbps. The 1st, 3rd, and 5th bars in Fig. 4.2 are the same as the 2nd, 4th, and 5th bars of the ﬁrst 4.4kbps group in Fig. 3.6.

We can ﬁnd from Fig. 4.2 that ECIVQ (2nd bar) performed better than SVQ (1st bar) for sets A and B, but slightly worse for set C, and the same trend can be observed

Figure 4.2: Comparison of diﬀerent approaches discussed in this paper for DSR: (a) averaged over all SNR values but separated for diﬀerent noise types in sets A, B, and C; (b) averaged over all noise types but separated for diﬀerent SNR values; and (c) averaged over all SNR values and noise types but separated for sets A, B, and C.

when HEQ is performed as a front-end of SVQ (HEQ-SVQ, 3rd bar v.s HEQ-ECIVQ, 4th bar). This is probably because ECIVQ considers quantization errors only, but the channel mismatch for set C might move the feature vectors to diﬀerent partition cells, for which the cluster variance used in ECIVQ was not able to help. HEQ oﬀered very signiﬁcant improvements when cascaded with SVQ or ECIVQ (HEQ-SVQ or HEQ-ECIVQ, 3rd or 4th bar), but the HQ (5th bar) proposed here consistently provided better performance in almost all cases, and the complete JUD proposed here including histogram shift (HQ-s,n,q, 6th bar) oﬀered additional improvements consistently in almost all cases. The accuracies for HEQ cascaded with ECIVQ (HEQ-ECIVQ, 4th bar) and HQ with JUD (HQ-s,n,q, the last bar) are further compared in Table 6.1. The relative error rate reductions shown in the last row are signiﬁcant and consistent for all SNR values, including the clean and 20 dB cases. The above experimental results are for a 4.4 kbps bit rate. Further analysis was then performed

Figure 4.3: Comparison of diﬀerent approaches discussed in this paper for DSR (but without transmission errors) under diﬀerent bit rates and SNR values: (a) clean, (b) 20 dB, (c) 15 dB, (d) 10 dB, (e) 5 dB, and (f) 0 dB.

Table 4.3: Accuracies and error rate reductions for HEQ-ECIVQ and HQ-s,n,q (with com-plete JUD) at 4.4 kbps for diﬀerent SNR values in Fig. 4.2(b).

SNR Clean 20 dB 15 dB 10 dB 5 dB 0 dB HEQ-ECIVQ 98.19 95.25 92.65 86.01 75.96 53.28 HQ-s,n,q(Complete JUD) 98.50 96.38 93.99 89.04 78.34 57.01 Relative error reduction(%) 17.13 23.79 18.23 21.66 9.90 7.98

for several better approaches found above with respect to diﬀerent bit rates (4.4, 3.9, 3.3, and 2.7 kbps) at all diﬀerent SNR values. The results are shown in Fig. 4.3(a)–(f) for diﬀerent SNR from clean to 0 dB, each with diﬀerent bit rates. The four bars in each set in Fig. 4.3 are respectively for ECIVQ considering quantization error uncertainty for SVQ, the cascade of transform coding (TC) and ECIVQ (TC-ECIVQ), the cascade of HEQ and ECIVQ (HEQ-ECIVQ), and HQ with complete JUD including histogram shift (HQ-s,n,q). Here, except for the clean speech case at higher bit rates, HQ-s,n,q consistently performed better for all SNR values and all bit rates than other combinations of the front-end feature transformation (TC

or HEQ) or back-end compensation considering quantization uncertainty (ECIVQ). Also, the performance of ECIVQ, TC-ECIVQ, and HEQ-ECIVQ are all more sensitive to lower bit rates, while HQ-s,n,q is relatively insensitive to diﬀerent bit rates at all SNR conditions.

4.6 Summary

In this chapter, Joint Uncertainty Decoding (JUD) under the framework of Histogram-based Quantization (HQ) is proposed here in this paper for robust and/or distributed speech recognition. Improved recognition performance was obtained consistently under all types of noise at all SNR values.

This page intentionally left blank.

Chapter 5 Three-Stage Error Concealment (EC) for HQ-Based DSR Systems

5.1 Introduction

Here we consider the approaches to handling the transmission errors added to the received HQ codewords under the DSR framework [51]. In this chapter, a three-stage EC approach is developed, as presented below. In Section 5.2 we introduce the frame and sub-vector error detection by HQ-consistency check. The estimation of the detected erroneous subvectors are presented in section 5.3, considering the prior speech source statistics, the channel transition probability, and the reliability of the received subvectors. In section 5.4, we introduce the reliability estimation and uncertainty decoding. Section 5.5 gives the overview of the three-stage error concealment (EC) framework. Experimental results are oﬀered in Sections 5.6, with the summary ﬁnally given in Section 5.7.

在文檔中強健及分散式語音辨識系統中的動態量化技術 (頁 61-68)