Experimental Results of Geometric Decoder

Decoding of Error-correcting Codes

3.2 Experimental Results of Geometric Decoder

The experiments of the proposed geometric decoder are done on the same setting as that of the off-the-shelf algebraic decoder in Section 2.4. Here, we focus on comparing the new decoder on HAMR and BCH codes with their algebraic decoder since the previous experiments have already shown that these codes are the better choices for multi-label classification.

We first demonstrate the advantage of the proposed geometric decoder over the alge-braic one using the same codeword predictions as in Section 2.4. The results are shown in Figure 3.1. Here the base learner is Binary Relevance with Random Forests. In the figures, alg stands for the algebraic decoder, and geo stands for the proposed geometric decoder. The soft decoding output of the geometric decoder is rounded back to {0, 1} for evaluation and comparison.

Figure 3.1(a) shows the result on 0/1 loss. For the BCH code, the proposed geometric decoder outperforms the algebraic one significantly on almost all datasets, especially the great improvement on the yeast and medical datasets. For the HAMR code, the geometric decoder is better than the algebraic one except on the genbase and enron datasets where both decoders have similar 0/1 loss.

Next we look at the Hamming loss in Figure 3.1(b). For the HAMR code, the proposed

Table 3.1: 0/1 loss changes when applying the proposed soft-output decoder

ECC base learner scene (M =127) emotions (M =127) yeast (M =255) tmc2007 (M =511) HAMR BR,Random Forest −.0101 ± .0010 −.0094 ± .0022 −.0077 ± .0010 −.0012 ± .0006 HAMR BR,Gaussian SVM −.0047 ± .0007 −.0145 ± .0030 −.0031 ± .0008 −.0009 ± .0005 HAMR BR,Logistic Regression −.0081 ± .0010 −.0078 ± .0028 −.0012 ± .0008 −.0006 ± .0005 HAMR 3-powerset,Random Forest −.0099 ± .0008 −.0071 ± .0023 −.0101 ± .0011 −.0014 ± .0006 HAMR 3-powerset,Gaussian SVM −.0042 ± .0006 −.0239 ± .0029 −.0082 ± .0006 −.0010 ± .0007 HAMR 3-powerset,Logistic Regression −.0064 ± .0009 −.0041 ± .0029 −.0051 ± .0009 −.0004 ± .0007 BCH BR,Random Forest −.0100 ± .0007 −.0101 ± .0030 −.0575 ± .0015 −.0231 ± .0014 BCH BR,Gaussian SVM −.0048 ± .0005 −.0437 ± .0039 −.0312 ± .0012 −.0078 ± .0014 BCH BR,Logistic Regression −.0127 ± .0007 −.0096 ± .0034 −.0396 ± .0017 −.0103 ± .0018 BCH 3-powerset,Random Forest −.0114 ± .0007 −.0087 ± .0032 −.0529 ± .0016 −.0210 ± .0013 BCH 3-powerset,Gaussian SVM −.0048 ± .0004 −.0343 ± .0044 −.0250 ± .0011 −.0090 ± .0013 BCH 3-powerset,Logistic Regression −.0070 ± .0006 −.0114 ± .0030 −.0256 ± .0014 −.0122 ± .0013 ECC base learner genbase (M =511) medical (M =1023) enron (M =1023)

HAMR BR,Random Forest −.0008 ± .0005 −.0010 ± .0010 −.0006 ± .0006 HAMR BR,Gaussian SVM −.0012 ± .0008 −.0004 ± .0007 .0002 ± .0005 HAMR BR,Logistic Regression .0032 ± .0057 .0015 ± .0010 .0004 ± .0004 HAMR 3-powerset,Random Forest .0010 ± .0005 −.0006 ± .0008 −.0001 ± .0004 HAMR 3-powerset,Gaussian SVM −.0005 ± .0004 −.0004 ± .0007 .0002 ± .0005 HAMR 3-powerset,Logistic Regression −.0030 ± .0061 .0004 ± .0010 −.0004 ± .0005 BCH BR,Random Forest .0005 ± .0003 −.0438 ± .0022 −.0308 ± .0016 BCH BR,Gaussian SVM −.0000 ± .0003 −.0068 ± .0025 −.0133 ± .0016 BCH BR,Logistic Regression .0127 ± .0018 −.0318 ± .0045 −.0217 ± .0013 BCH 3-powerset,Random Forest .0003 ± .0004 −.0280 ± .0018 −.0238 ± .0018 BCH 3-powerset,Gaussian SVM −.0007 ± .0003 −.0150 ± .0018 −.0055 ± .0016 BCH 3-powerset,Logistic Regression .0003 ± .0006 −.0216 ± .0022 −.0139 ± .0013

method has a small improvement on the scene, emotions, and yeast datasets, and has similar Hamming loss with the algebraic decoding method on other datasets. How-ever, for the BCH code, the proposed method has worse Hamming loss on the yeast, emotions, and enron datasets. The reason may be that the geometric decoder mini-mizes the distance between approximated enc(˜y) and ˜b in the codeword space. However, the BCH code does not preserve the Hamming distance during encoding and decoding between {0, 1}^Kand {0, 1}^M, so the geometric decoder, which minimizes the distance in [0, 1]^M (and approximately in {0, 1}^M), may not be suitable to the Hamming loss (Ham-ming distance in {0, 1}^K).

Similar results show up when using other base learners, as shown in Table 3.1 and 3.2.

In the tables, each entry reports the difference between the results of the geometric de-coder and the algebraic dede-coder. The bold entries indicate that the geometric dede-coder is significantly better than the algebraic one. The results validate that the proposed

geomet-Table 3.2: Hamming loss changes when applying the proposed soft-output decoder

ECC base learner scene (M =127) emotions (M =127) yeast (M =255) tmc2007 (M =511) HAMR BR,Random Forest −.0008 ± .0002 −.0012 ± .0007 −.0006 ± .0002 −.0000 ± .0001 HAMR BR,Gaussian SVM −.0001 ± .0001 .0035 ± .0013 −.0003 ± .0001 −.0001 ± .0000 HAMR BR,Logistic Regression −.0003 ± .0002 .0018 ± .0009 −.0001 ± .0002 −.0001 ± .0001 HAMR 3-powerset,Random Forest −.0002 ± .0002 .0005 ± .0006 −.0004 ± .0002 −.0001 ± .0000 HAMR 3-powerset,Gaussian SVM −.0001 ± .0001 .0046 ± .0009 .0001 ± .0002 .0000 ± .0001 HAMR 3-powerset,Logistic Regression .0001 ± .0002 .0029 ± .0009 .0005 ± .0002 −.0002 ± .0001 BCH BR,Random Forest .0011 ± .0002 .0068 ± .0009 .0070 ± .0005 .0005 ± .0002 BCH BR,Gaussian SVM −.0005 ± .0002 .0192 ± .0022 .0073 ± .0004 .0030 ± .0001 BCH BR,Logistic Regression .0002 ± .0002 .0141 ± .0012 .0079 ± .0006 .0036 ± .0002 BCH 3-powerset,Random Forest .0009 ± .0002 .0035 ± .0013 .0062 ± .0005 .0006 ± .0002 BCH 3-powerset,Gaussian SVM .0004 ± .0001 .0122 ± .0023 .0055 ± .0004 .0025 ± .0002 BCH 3-powerset,Logistic Regression .0008 ± .0002 .0089 ± .0016 .0072 ± .0005 .0022 ± .0002 ECC base learner genbase (M =511) medical (M =1023) enron (M =1023)

HAMR BR,Random Forest −.0000 ± .0000 −.0000 ± .0000 −.0000 ± .0000 HAMR BR,Gaussian SVM −.0001 ± .0000 −.0000 ± .0000 −.0000 ± .0000 HAMR BR,Logistic Regression .0001 ± .0004 .0001 ± .0000 .0000 ± .0000 HAMR 3-powerset,Random Forest .0000 ± .0000 −.0000 ± .0000 −.0001 ± .0000 HAMR 3-powerset,Gaussian SVM −.0000 ± .0000 −.0000 ± .0000 .0000 ± .0000 HAMR 3-powerset,Logistic Regression −.0002 ± .0002 .0000 ± .0000 −.0001 ± .0000 BCH BR,Random Forest .0000 ± .0000 .0002 ± .0001 .0073 ± .0003 BCH BR,Gaussian SVM .0000 ± .0000 .0009 ± .0001 .0090 ± .0002 BCH BR,Logistic Regression .0030 ± .0004 −.0008 ± .0003 .0086 ± .0002 BCH 3-powerset,Random Forest .0000 ± .0000 .0005 ± .0001 .0083 ± .0003 BCH 3-powerset,Gaussian SVM .0000 ± .0000 .0007 ± .0001 .0082 ± .0004 BCH 3-powerset,Logistic Regression .0001 ± .0001 −.0001 ± .0001 .0077 ± .0003

ric decoder can decode more accurately (lower 0/1 loss) and with similar Hamming loss comparing to the algebraic decoder.

3.2.1 Bit Error Analysis

Next, we look deeper into the scene dataset, and fix the base learner to BR with Random Forests. The instances are grouped by the number of bit errors at that instance. First, we plot the ratio of the group size to the total number of instances in Figure 3.2 for HAMR and BCH codes. Besides the highest peak at 0 bit errors, another peak for the BCH code is at 63 bit errors, which is higher than that for HAMR at 38 bit errors. This suggests that BCH code is harder to learn, which is consistent to our finding in Section 2.4.3,

Then, we plot the 0/1 loss and Hamming loss in each group for HAMR, as shown in Figure 3.3. From Figure 3.3(a), we can see that the geometric decoder is able to correct errors more accurately when there are 16 to 24 bit errors, comparing to the algebraic

(a) HAMR (b) BCH

Figure 3.2: Bit error distribution of BR with Random Forests on the scene dataset

(a) 0/1 loss vs. number of bit errors (b) Hamming loss vs. number of bit errors

Figure 3.3: Strength of HAMR on the scene dataset and BR with Random Forests

decoding. The ordinary decoding method of HAMR has two-stages, one for HAM (7, 4) and one for repetition code, and each HAM (7, 4) block is decoded independently. In the proposed geometric decoding method, the two stages are combined into one, which enables joint decoding of those HAM (7, 4) blocks and thus ensures that the decoding of each HAM (7, 4) block is consistent to others. This leads to superior performance of the proposed decoding method on 0/1 loss. For Hamming loss, as shown in Figure 3.3(b), the improvement of the geometric decoder at that bit error range is small, which explains the small improvement on Hamming loss.

We also plot the 0/1 loss and Hamming loss in each group for the BCH code in Figure 3.4. The algebraic decoder can correctly recover the label vector with no 0/1 loss

(a) 0/1 loss vs. number of bit errors (b) Hamming loss vs. number of bit errors

Figure 3.4: Strength of BCH on the scene dataset and BR with Random Forests

for instances with at most 31 bit errors, but for instance with 32 bit errors the 0/1 loss sharply goes up to 0.97. The proposed geometric decoder did a better job for instances with 32–39 bit errors, so its 0/1 loss goes up more smoothly. This is exactly what we would like to address in the beginning of this Chapter. On the other hand, in terms of Hamming loss shown in Figure 3.4(b), the proposed geometric decoder has 0.01–0.025 higher Hamming loss than the algebraic one for instances with 37–45 bit errors, which yields the slightly worse result of geometric decoder on Hamming loss.

From this analysis, we may conclude that the geometric decoder can improve 0/1 loss because it really does a better job on the instances far from valid codewords. However, regarding Hamming loss, the geometric decoder gets improvements for HAMR, but not for BCH.

3.3 Soft-input Decoding and Bitwise Confidence Estima-tion for k-powerset Learners

In Section 3.1, we proposed the geometric decoder based on approximating XOR by multiplication. Since L2 distance in [0, 1]^M space is used as optimization criterion, the input codeword prediction ˜b is not necessary to be in {0, 1}^M but can also be in [0, 1]^M. That is, this decoding method supports not only soft outputs but also soft inputs. The soft

inputs may come from the confidence of each bit, which the channel, the multi-label base learner, provides. By considering the confidence of bits, the decoder may rely on high-confidence bits more and try to correct low-high-confidence bits. In this way, the performance of decoders may be further improved.

Since our channel is the base learner, it is possible to gather meaningful soft signals from the channels, which is the confidence score or probability estimate of the predicting bit to be 1. It is simple to ask a Binary Relevance learner to provide confidence of each bit, since confidence or probability estimate is supported by many state-of-the-art binary clas-sifiers, including Random Forests and SVM [Platt, 1999, Lin et al., 2007]. However, for a k-powerset learner, things are more complicated. The k-powerset learners take a com-bination of k bits as a class, and the base learners only output confidence information per combination of k bits but not per bit. To apply the proposed soft-input geometric decoder, we have to estimate the confidence of each bit from the confidence of the combinations of k bits.

A k-powerset learner would output 2^kconfidence scores, one for each combination of the k bits b₁· · · b_k∈ {0, 1}^k. To estimate per-bit confidence conf (b_i = 1) for each b_i, we propose the following methods.

1. Maximum. Pick the combination b^∗₁· · · b^∗_k with the highest confidence and then assign conf (bi = 1) to be 1 if b^∗_i = 1, or 0 otherwise. This results in the hard input, which is the same as what we used in Section 2.4 and 3.2.

2. Marginal probability. The confidence score of each combination can be treated as the joint probability distributed over the 2^K combinations of bits. Then, we may calculate conf (bi = 1) as marginal probability by summing up the confidence scores of all combinations with b_i = 1.

conf_margin(b_i) = X

b1···bi−1bi+1···b_k∈{0,1}^k−1

conf (b₁· · · b_i−11b_i+1· · · b_k)

In contrast to the “maximum” method, the marginal probability takes the whole distribution into account, so the most probable combination according to marginal

probability may be different from the one with the highest confidence.

3. Confidence difference. The confidence of the ith bit to be b^∗_i may be defined as the difference of confidence scores between the most confident combination b^∗₁· · · b^∗_k and its neighbor varying only this bit b^∗₁· · · b^∗_i−1b^∗_ib^∗_i+1· · · b^∗_k, where b^∗_i is the negation of b^∗_i. Following the idea, we may define conf (b_i = 1) as

conf_{dif f}(b_i) = 1 2+1

2 conf (b^∗₁· · · b^∗_i−11b^∗_i+1· · · b^∗_k) − conf (b^∗₁· · · b^∗_i−10b^∗_i+1· · · b^∗_k)

Comparing to “marginal probability,” which is essentially the sum of difference of confidence scores between all pairs of neighboring combinations, this method only considers the highest-confidence combination and its neighbors. Therefore, the result of “confidence difference” is consistent with the “maximum” method.

4. Sigmoid functions. We may apply sigmoid functions on the “confidence differ-ence” to enlarge the small amount of difference. The reason is that small confidence values make the geometric decoder not stable. We used tanh(αx) as the sigmoid function.

conf_s-dif f(b_i) = 1 2 +1

2tanh (α · (2 · conf_{dif f}(b_i) − 1))

The sigmoid functions may also be applied on the output of “marginal probability,”

resulting in another confidence estimating method conf_s-margin(·).

Note that the Binary Relevance approach is a special case of k-powerset with k = 1.

Therefore, we may also apply these methods for BR learners. When applying to BR learners, the “marginal probability” would be the same as taking the confidence of the bit directly from the BR learner. Moreover, if the confidence is given in probability form, the

“confidence difference” would also be the same.

在文檔中剛性與柔性解碼之錯誤更正碼於多標籤分類學習之應用 (頁 42-49)