Improvement by Unlabeled Data - 通過兩院制投票框架來使用未標記數據增強模型

Chapter 5 Evaluation 19

5.7 Improvement by Unlabeled Data

We re-partition Food-101 dataset into four parts for these experiments because the amount of data is limited. The four parts are training data, unlabeled data, students’ data and test data respectively. And the percentages of them are 35%, 30%, 10%, and 10%

respectively. Training data is used for training master model. Unlabeled data will get their labels through predictions from bicameralism and become the training data of master model. Students’ data is used for training all the student models. Test data is used to evaluate the quality of models.

Note that the training data and the test data here are the different subset of Food-101 from the previous bicameralism voting part, so the accuracy of models of this part is different. In bicameralism voting part, we partition the data into training data and testing data according to the way recommended by the author of Food-101. There are no noises in these test data. In this part, we sorted images by their names in all the classes of Food-101 respectively, and take the top N% data When we need N% data. The test data is always the last 10% and there are noises in these test data.

Test data Unlabeled data

Master model Bicameralism Master model Bicameralism

1st 62.42% 66.30% 62.28% 66.30%

2nd 64.09% 68.19% 66.13% 67.81%

3rd 64.96% 68.68% 66.82% 68.60%

4th 65.73% 69.80% 67.74% 69.46%

Table 5.6: Result of 35% master model repeating same data

Table 5.6 shows the result of the first situation of unlabeled data. That is, bicameral-ism makes predictions on all the unlabeled data(35%) at once, and they will be a part of training data in next round. The first row in Table 5.6 is the result of the first round, and

the second round in Table 5.6 is the result of the second round and so on. In first round, training data(35%) was used to train master model. In second, third, and fourth round, both training data(35%) and unlabeled data(30%) were used to train master model. The accuracy of the generated labels of unlabeled data is the accuracy of bicameralism of un-labeled data in the previous round. That is, the fourth column. Test data is used to test the robustness of the model, on the other hand, unlabeled data are used to show the quality of generated labels. As we can see, the accuracy of bicameralism of unlabeled data increased every round. Therefore master model can receive more information from the unlabeled data.

Test data Unlabeled data

Master model Bicameralism Master model Bicameralism

1st 62.42% 66.30% 62.20% 65.64%

2nd 64.66% 68.67% 65.04% 68.21%

3rd 64.92% 69.19% 67.66% 69.84%

4th 65.74% 69.63% 69.14% 70.18%

Table 5.7: Result of 35% master model without filter

Table 5.7 shows the result of the second situation of unlabeled data. In first round, training data(35%) is used to train master model, too. In the next three rounds, one-third of unlabeled data(10%) is added to the training data every round. When the fourth round, all the unlabeled data is added to the training data. In third or fourth round, the original unlabeled data would be re-predicted as well as newly added unlabeled data. The fourth element of the first row is the accuracy of first one-third of unlabeled data, and the fourth element of the second row is the accuracy of first two thirds of unlabeled data. The fourth element of the third and fourth rows are the accuracy of all of unlabeled data. The results show that the final effect to master model of two situations of unlabeled data are almost the same. However, the second one is more efficient because when the second and third round, less unlabeled data was added to the training data. The amount of computation

would be less.

Test data Unlabeled data

Master model Bicameralism Master model Bicameralism

1st 62.42% 66.30% 62.20% 65.64%

2nd 64.29% 69.16% 66.17% 69.68%

3rd 65.42% 70.32% 68.83% 71.44%

4th 66.37% 70.70% 70.00% 71.96%

Table 5.8: Result of 35% master model with filter

Table 5.8 shows the result of the second situation of unlabeled data with filter. The usage of unlabeled data here was same as previous part, One-third of unlabeled data was added each round. We used the filter to decrease the noises in unlabeled data, so master model would obtain more useful data. The amount of added unlabeled data would cut back but it was good for quality of training data and training efficiency. As we can see the result is better than all the previous parts. Our master model in the fourth round used 65%

data of Food-101, and its accuracy is up to 66.37%. Although the accuracy of a model using 65% data of Food-101 is 68.55%, our master model used only 35% real labels.

Round Accuracy

Table 5.9 shows the detail of the filter. From the second round, unlabeled data was added, so this table displays the detail from the second round. After filtering the generated labels of unlabeled data, those whose confidences were higher than the threshold would be retained. The third column, Percentage of Retaining Input, means that the percentage of unlabeled data whose confidences were higher than the threshold. The second column,

Accuracy, means the accuracy of retained unlabeled data. We can see that the accuracy is significantly increased after passing the filter. After the useless data was deleted, the amount of computation for master model training was reduced. The computation time was reduced too. The fourth column, Saving Time, means that how much time was saved compared with the one without filter.

Test data Unlabeled data

Master model Bicameralism Master model Bicameralism

1st 56.26% 60.50% 56.17% 61.28%

2nd 57.94% 61.67% 59.84% 62.53%

3rd 57.36% 61.60% 62.14% 62.70%

Table 5.10: Result of 20% master model without filter

Test data Unlabeled data

Master model Bicameralism Master model Bicameralism

1st 56.26% 60.50% 56.17% 61.28%

2nd 58.69% 62.85% 61.05% 64.00%

3rd 59.81% 63.50% 64.00% 65.06%

Table 5.11: Result of 20% master model with a filter

Besides doing the experiment from the 35% master model, we also did it from the 20% master model. The 35% master model means that the master model used 35% labeled data, and started to add unlabeled data in. Now, we want to try the experiment from a weaker master model, 20% labeled data used only. Table 5.10 shows the of 20% master model without filter, and Table 5.11 shows the of 20% master model with a filter. There was 10% data added to the master model in both the second and the third round. If we didn’t use a filter, then the accuracy of predictions of unlabeled data was the result of bicameralism. That is, the number in the fourth column, about 62% only. After using a filter to delete useless data, 73% of inputs were retained and the accuracy of the predictions of unlabeled data was increased to 75.6% in the second round. In the third round, 73% of inputs were retained and the accuracy was increased to 78.3%. Comparing the two tables, we can see that the result of that with a filter is better than that without filter obviously.

Especially the third round, the result is almost same with the result of the second round.

The result of predictions of unlabeled data through bicameralism was improved very little after the first round. Therefore in the third round, the labels of unlabeled data were almost the same with the previous round. The unlabeled data can no longer improve the master model after the second round.

When the master model is weak, a filter becomes important. The results of the 35%

master model are good whether when it has a filter or not. However, when the predictions to the unlabeled data were not good enough, it needed a filter definitely. But we still recommend using a filter when doing this experiment, because a filter not only improves the result but also speeds up the training.

N% Data Accuracy of master model

10% 28.18%

Table 5.12: Different percentage of data and the corresponding accuracy

Table 5.12 shows that using N% data to train a model then the accuracy of the model on the test data. The model here was trained by labeled data. This table can be used as a reference for unlabeled data experiments. For example, the accuracy in the fourth row, the first column of table 5.8 means that using the 65% data to trained a master model.

But in this 65% data, 35% data was labeled data and 35% data was unlabeled data. The accuracy in table 5.8 is a bit lower than the accuracy of 65% model in Table 5.12. On the other hand, the accuracy in the third row, the first column of Table 5.11 is much lower than the accuracy of 45% model in Table 5.12. We think that the master model has to robust

enough initially, or unlabeled data can’t play a good role.

在文檔中通過兩院制投票框架來使用未標記數據增強模型 (頁 40-45)