Chapter 4 Experimental results
4.2 Experimental Results
4.2.2 Performance Analysis
100 task graphs generated from TGFF randomly are used for each case, and each task graph has at least 210 to 250 tasks. The maximum inputs/outputs of each task is 7 to 10 which indicates that there are many communication paths between tasks. The communication amount is modified by multiplying the communication factor. In other words, the communication factor is the ratio of the communication amount to the computation amount. The magnitude of the communication factor indicates the communication loading degree of a communication path. In other views, the communication factor also indicates the provided physical bandwidth. Larger communication factor, smaller physical bandwidth can be obtained.
In this experiment, the communication factor is set to 0.25, 0.5, 1, 2, and 4. The task graph with the communication factor less than one implies that the application is computation intensive. On the other hand, the communication factor larger than one means that the application is communication intensive. Fail rate is defined as the
54
number of failed transactions over the number of the total transactions. Since a failed transaction needs to transmit the same data again, unnecessary power consumption will be resulted in. Higher fail rate, more power consumption used for useless transactions;
thus, the total power consumption increases. The relationship between the fail rate and the communication factor by comparing the traditional architecture with the proposed hierarchical one is shown in Figure 4.5. The fail rate increases with the increasing of the communication factor and saturates at approximately 22% for traditional architecture and 17% for hierarchical one. The trend of the curve means that the fail rate would be under controlled even under communication intensive conditions. Then, the proposed hierarchical architecture improves the fail rate of 28.5% with communication factor of one and 21.8% for communication factor of four.
0.06
Figure 4.5: Fail rate versus different communication factor.
55 0.04
0.09 0.14 0.19 0.24
0 5 10 15 20 25 30 35
Buffer Size
Fail Rate
Traditional Architecture Hierarchical Architecture
37.6% improvement
Figure 4.6: Fail rate versus different buffer size.
The transaction is failed when the buffer of the next switch is full. Hence, a larger buffer size makes the lower fail probability. In order to reduce the fail rate, the buffer size is increased. In this experiment, the buffer size is increased from 2 to 4, 8, 16, and 32. The relationship between the fail rate and the buffer size is shown in Figure 4.6 and the fail rate decreases with the increased buffer size. Comparing with the traditional architecture, the hierarchical one provides about 37.6% improvement for the buffer size of four.
56
Figure 4.7: Latency versus different communication factor.
Figure 4.7 shows the communication latency versus the communication factor.
The latency is defined as the elapsed time spent for one data transmitted from the source PE to the destination PE. The latency rises linearly following the increasing of the communication factor. To compare with the traditional architecture, the hierarchical one improves the latency by 14.4% under the communication factor of four.
57
Figure 4.8: Latency versus different buffer size.
For the relationship of the latency and the buffer size, the latency decreases when the buffer size increases, as shown in Figure 4.8. That also implies that a lower fail rate makes the latency smaller. Compared with the traditional architecture, 15.5%
latency improvement under the buffer size of four for the hierarchical architecture is obtained.
58
Figure 4.9: Throughput versus different communication factor.
Figure 4.9 shows the system throughput versus the communication factor. The
system throughput is defined as the executed application times during a fixed period time. In this experiment, the fixed period time is 50000 clock cycles. It can be detected that the throughput is improved by the hierarchical architecture under communication intensive applications; nevertheless, the throughput decreases under computation intensive applications. For this phenomenon, it can be explained as the different using time between computation and communication. For computation intensive applications, the total spending time for computation is more than that for communication. On the other hand, for communication intensive applications, the total spending time for communication is more than that for computation. The hierarchical
59
architecture can improve the communication efficiency indeed but for computation intensive applications, it still spends lots of time for computation. That is why the hierarchical architecture used for computation intensive applications can improve the fail rate and the latency except the throughput. Under the communication factor of four, communication intensive application, 27% throughput improvement is obtained.
Undoubtedly, for communication intensive applications, the hierarchical architecture can attain better system performance. On the other hand, if the latency has higher priority than the throughput under computation intensive applications, the hierarchical architecture would also be a better choice.
0
Figure 4.10: Network usage versus different communication factor.
Figure 4.10 shows the network usage of I and L2 versus different communication
60
factor. The usage of L2 is defined as that the total communication amount on L2 over the total communication amount on the whole network. The usage of L2 increases following the increasing of the communication factor.
Table 4.1: Comparison between traditional and hierarchical architecture at communication factor = 4 and buffer size = 2.
Fail Rate Latency Throughput
Traditional 0.2285 380.9 (cycles) 23.29 Hierarchical 0.1788 326 (cycles) 29.57
Improvement 21.75% 14.41% 26.96%
Table 4.1 shows the comparison between the hierarchical and the traditional
architecture under the communication factor of four and the buffer size of two. We can see that the hierarchical architecture improves the system performance in fail rate, latency and throughput under the computation intensive applications.