Performance - 抽取謂詞以改進性質導向可達性技術

In this section, we present the result for clause-based (CLS), obligation-based (OBL) and combined predicate in order. The first two kinds are conducted on ”pdr_vb” for a preliminary examination, while the last one is on ”full” to demonstrate the final result.

Note that either in the table or in the figure, all the variants are sorted by the solving effort on predicate in the ascending order. The first one is the basic PDR that is the special case with no effort spent. For both CLS and OBL, we use a small SAT query limit (L = 300) for the experiment to find a relatively proper value combination based on this L. Then we use this value combination back to test different L. Last, the best L found in the previous round is used to test the trend discovered at the first.

For CLS, each frame can be assigned by different Backtrack number (B) and Match Number (M ). For simplicity, we introduce only two schemes including INF and ALL.

INF is to observe only F_∞, that is, the Backtrack number of the other frames are all zeros. ALL is to treat all the frames equally with the same B and M . The value we choose to test different L is (B, M ) = (20, 1) and (10, 2) for INF and ALL respectively.

The result is shown by Figure 5.2 and 5.3. If fixing the other parameters, we assume the number of predicates are the same. Hence, the effort is proportional to the magnitude of L. The number of solved cases of all are larger than the one of baseline except for the case with no limit (L =∞). This case is intentionally added to show the extreme case of predicate solving. Apparently, it loses the direction to reach the final answer.

Furthermore, the trend is obvious that the best one locates in the middle and the result degrades to the both sides.

SAT Limit Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 0 195.1

300 553 150 403 3 209.3

1000 554 149 405 3 213.3

5000 550 147 403 0 235.6

no limit 495 130 365 1 219.8

Table 5.2: Comparison among different limit of SAT query for CLS, INF, B = 20, M = 1 on pdr_vb.

Figure 5.2: The cumulative plot of different limit of SAT query for CLS, INF, B = 20, M

= 1 on pdr_vb.

SAT Limit Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 2 195.1

300 554 149 405 1 195.2

1000 558 151 407 4 211.5

5000 551 149 402 1 240.4

no limit 464 124 340 1 191.7

Table 5.3: Comparison among different limit of SAT query for CLS, ALL, B = 10, M = 2 on pdr_vb.

Figure 5.3: The cumulative plot of different limit of SAT query for CLS, ALL, B = 10, M

= 2 on pdr_vb.

We then present the concept of sweet spot again by the experimental result in Figure 5.4. In addition, since the sample points are loose, we can further find a better point once the trend is confirmed. The concept is demonstrated in Figure 5.5.

Figure 5.4: The schematic diagram of the sweet spot prediction and illustration by the result of CLS, ALL, B = 10, M = 2 on pdr_vb. X-axis means the effort and Y-axis means the gain. Note that the gain can be negative, which means the overall performance is worse. The bars mean the number of solved cases in Figure 5.3 and they are not in exact scale. In this figure, we use difference among these numbers to represent the gain.

Figure 5.5: The depiction of sample points. X-axis means the effort and Y-axis means the gain. If we obtain the five points by experiment and basically confirm that the trend exists, we can exclude the outer range (red area) since the best point is not possible to be at there. Then we can apply denser sample points in the yellow area to find a better one.

Since L = 1000 is the best choice of the above two figures, we use it to test different B and M for both INF and ALL. The higher B is, the more clauses are checked so that the number of predicates increases. The lower M is, the looser constraint to pick up a common part as candidate, which leads to less predicate. Though not every predicate uses up all the SAT query limit, we assume they consumes the same effort. Hence, the effort is roughly proportional to the ratio B/M . The result is shown in Figure 5.6 and 5.7. The best choices are 20_1 and 10_2 for INF and ALL respectively. There are two possible reasons. The first one is that the number of frame under observation differs.

This reason corresponds to the hypothesis of sweet spot we made above. In INF, we only consider F_∞ so that the constraint should be looser to create more chance. However, in ALL, all the frames are under consideration so that we need to apply a tighter criterion.

The second is to state the different usefulness among the frames. All the clauses in F_∞ forms an inductive set so that they are more meaningful and should be applied by a looser constraint. On the other hand, we should not emphasize too much on the other frames since they seem to be no so useful. An interesting phenomenon is that the performance of 10_1 and 10_2 for ALL are poor. Nonetheless, this does not happen for the case with L = 300. This is probably an evidence that we put too much effort on predicate solving.

If the reason is actually the second one, there should be possible to combine the benefit from the two different sources. Hence, we introduce another scheme MIX that uses (B, M ) = (20, 1) and (10, 2) for F_∞ and the other frames respectively and simultaneously. In Figure 5.8, MIX does not perform well and it is even the worst one.

However, as the case with L = 300 depicted by Figure 5.9, MIX is the best one instead.

We then make a conclusion that the reason is closer to the first one. MIX has a higher possibility to find predicate than both INF and ALL if we simply consider the ratio B/M for every frame. For the case with L = 1000, INF_20_1 and ALL_10_2 are relatively better choices, so changing to MIX does not improve. But for L = 300, INF_20_1 and ALL_10_2 have not reached the sweet spot, so changing to MIX is better.

B_M Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 3 195.1

10_2 546 148 398 3 199.2

10_1 548 147 401 3 196.4

20_1 554 149 405 3 213.3

Table 5.4: Comparison among different ratio of Backtrack number and Match number for CLS, INF, L = 1000 on pdr_vb.

Figure 5.6: The cumulative plot of different ratio of Backtrack number and Match number for CLS, INF, L = 1000 on pdr_vb.

B_M Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 5 195.1

10_2 558 151 407 10 211.5

10_1 535 145 390 2 228.5

20_1 532 143 389 1 224.0

Table 5.5: Comparison among different ratio of Backtrack number and Match number for CLS, ALL, L = 1000 on pdr_vb.

Figure 5.7: The cumulative plot of different ratio of Backtrack number and Match number for CLS, ALL, L = 1000 on pdr_vb.

Type Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 3 195.1

inf_20_1 554 149 405 2 213.3

all_10_2 558 151 407 3 211.5

mix 553 148 405 0 211.0

Table 5.6: Comparison among two best ratios and the mixed version for CLS, L = 1000 on pdr_vb.

Figure 5.8: The cumulative plot of two best ratios and the mixed version for CLS, L = 1000 on pdr_vb.

Type Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 1 195.1

inf_20_1 553 150 403 0 209.3

all_10_2 554 149 405 1 195.2

mix 558 152 406 2 203.9

Table 5.7: Comparison among two best ratios and the mixed version for CLS, L = 300 on pdr_vb.

Figure 5.9: The cumulative plot of two best ratios and the mixed version for CLS, L = 300 on pdr_vb.

For OBL, we use T = 66 to test different L and the result is shown in Figure 5.10.

Again, the trend is obvious so that we choose the case with L = 300, which is the same as the preliminary test. We then use L = 300 to test different Obligation threshold (T ).

The higher T is, the less chance to check and the less possibility of common part exists, which leads to less predicate. Hence, the effort is inversely proportional to T . We observe the trend in Figure 5.11 once again and the best value is T = 66. It seems like that the hypothesis is also applicable to OBL.

Finally, we test all the cases in ”full” for both CLS and OBL to examine the ability of them to handle the hard problem for the original PDR. The values of parameter are (B, M ) = (10, 2) for CLS, ALL and T = 66 for OBL. In addition, we also try to activate the two kinds of predicate simultaneously. The result in shown in Figure 5.12. As indicated by the result of MIX, we know that we cannot ensure the improvement by simply solving more and more predicates. This is because we do not gain the information right when we find the predicate and we need to solve it practically. On the other hand, we know that both CLS and OBL works well. They outperform the basic PDR by 29 and 18 cases respectively.

SAT Limit Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 5 195.1

300 551 148 403 4 193.8

1000 543 144 399 3 183.3

5000 540 144 396 2 188.0

Table 5.8: Comparison among different limit of SAT query for OBL, T = 66 on pdr_vb.

Figure 5.10: The cumulative plot of different limit of SAT query for OBL, T = 66 on pdr_vb.

Threshold Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 3 195.1

100 543 147 396 1 191.4

66 551 148 403 5 193.8

36 540 145 395 1 170.1

Table 5.9: Comparison among different Obligation threshold for OBL, L = 300 on pdr_vb.

Figure 5.11: The cumulative plot of different Obligation threshold for OBL, L = 300 on pdr_vb.

Type Total SAT UNSAT Unique Solve Average Time (s)

baseline 543 149 394 2 195.1

cls 572 151 421 11 244.8

obl 561 150 411 2 232.7

cls + obl 567 151 416 9 234.5

Table 5.10: Comparison among the best parameters for CLS, OBL and the combination of both on full.

Figure 5.12: The cumulative plot of the best parameters for CLS, OBL and the combination of both on full.

在文檔中抽取謂詞以改進性質導向可達性技術 (頁 58-71)