Chapter 5 Simulation and Analysis
5.3 Experiments
5.3.2 Simulation of Video Phone Scenario
In our interconnection, we can configure the factors as shown in Table 5-5 in advance to simulate our video phone scenario. Simulating the all combination of the factors, which it is not an efficient way to get the optimal configuration of PMP platform performing video phone scenario so we analyze the impact of each factor or
some combination of factor to get a guide to configure the factor of PMP platform properly.
Table 5-5 Factor of configuration
Factor Description
Wrapper buffer size Capability of out-of-order transactionArbitration policy of channels When contention occurs, choose a device and grant it
Task access setting Decide what kind of task using data lock mode Data lock mode buffer size Capability of interconnection processing data lock
mode transaction
Weight tuning of devices Priority tuning of arbitration policy
A. Wrapper buffer size and arbitration policy of channels
First of all, we take wrapper buffer size and arbitration policy of channels as our variables to analyze the impact of them. The detailed configure shows in Table 5-6.
We test the wrapper buffer size with size of 1, 2, 4, 8 and 16, and take address and data channels as the separate variables to configure each arbitration policy. The policy setting in Table 5-6 uses two letters to express what policy used in address channel and data channel. The first letter means which arbitration policy used in address channel and the second letter means which arbitration policy used in data channel. We choose Round-Robin as the write response channel, which this is because there is only one completion of write transaction which needs to transfer write response so that we choose a fair arbitration policy as the write response channel. The task access setting is all normal transactions so there is no necessary to configure data lock mode buffer size. The weighting of devices is set according the bandwidth requirements of video phone scenario.
Table 5-6 Setting of simulation A
Wrapper buffer size
buffer size 1, 2, 4, 8, 16
Arbitration policy of channels
Policy setting Address channel Data channel Write response channel FF Fixed priority Fixed priority Round-Robin
FT Fixed priority TDMA Round-Robin
FR Fixed priority Round-Robin Round-Robin FL Fixed priority Lottery Round-Robin
TF TDMA Fixed priority Round-Robin
TT TDMA TDMA Round-Robin
TR TDMA Round-Robin Round-Robin
TL TDMA Lottery Round-Robin
RF Round-Robin Fixed priority Round-Robin
RT Round-Robin TDMA Round-Robin
RR Round-Robin Round-Robin Round-Robin
RL Round-Robin Lottery Round-Robin
LF Lottery Fixed priority Round-Robin
LT Lottery TDMA Round-Robin
LR Lottery Round-Robin Round-Robin
LL Lottery Lottery Round-Robin
Table 5-7 shows that if the each configuration met timing constraint or not. It is obvious when buffer size is exceeding 8, the configurations met the timing constraint with all normal transactions of video phone scenario. Under buffer size 8, the data channel with policies of TDMA all met the constraint and the address channel with policies of fixed priority are most violated the timing constrain. It is interesting that the address channel with policies of Round-Robin most met the timing the constraint.
It may explain that Round-Robin in address channel is more efficient than TDMA, and TDMA is data channel is more efficient than Round-Robin. It may caused by the different scheme of TDMA and Round-Robin mapping to the weight tuning. Under buffer size 16, TDMA, Round-Robin and Lottery most met timing constraint except
fixed priority.
Fig. 5-2 shows the completion time of video phone scenario. It is obvious that whatever fixed priority used in address channel or data channel has longer completion time. Fig. 5-3 also shows the same result that fixed priority gets poor bandwidth utilization than others. This is because that fixed priority is more possible causing the starvation and limiting the out-of-order completion.
Table 5-7 Timing constraint status with all normal transaction of video phone scenario
1 2 4 8 16 FF Violated Violated Violated Violated Violated
FT Violated Violated Violated Met Met
FR Violated Violated Violated Violated Violated FL Violated Violated Violated Violated Violated TF Violated Violated Violated Violated Violated
TT Violated Violated Violated Met Met
TR Violated Violated Violated Violated Met
TL Violated Violated Violated Met Met
RF Violated Violated Violated Violated Violated
RT Violated Violated Violated Met Met
RR Violated Violated Violated Met Met
RL Violated Violated Violated Met Met
LF Violated Violated Violated Violated Violated
LT Violated Violated Violated Met Met
LR Violated Violated Violated Violated Met
LL Violated Violated Violated Met Met
Buffer size Policy
setting
Completion Time of Video Phone with All Normal Transactions
Fig. 5-2 Completion time of video phone with all normal transactions
Bus Utilization of Video Phone with All Normal Transactions
0%
Fig. 5-3 Bandwidth utilization of video phone with all normal transactions
B. Task access setting
The previous simulations only used the normal and interleave transfer modes in our interconnection. This time we configure the task pattern to generate the transactions using data lock mode. Table 5-8 shows the configuration of tasks. We category the tasks into two kinds which one is accessing memory controller and the other is accessing other devices. In the setting 1, tasks of accessing memory controller use data lock mode and tasks of accessing other devices use normal mode. The setting 2 is configured in the contrary way. Table 5-9 is the configuration of data lock mode in our interconnection. We also test the wrapper buffer size and arbitration policy of channels as simulation A.
Table 5-8 Configuration of simulation B
Tasks of accessing memory controller
Tasks of accessing other devices
Setting 1 Using data lock mode Using normal mode Setting 2 Using normal mode Using data lock mode
Task Setting
Table 5-9 Configuration of data lock mode of simulation B
Data lock mode buffer size 1 Hybrid mode threshold 1
Table 5-10 shows the simulation results of setting 1. There are 24 configurations which met the timing constraint. The setting 1 increased 6 met configurations comparing to simulation A. In the configuration of setting 1, we can observe that Round-Robin in data channel all violated the timing constraint under buffer size 8.
The reason of this phenomenon may concern with the weight tuning of arbitration policy. The weight tuning of arbitration we will introduce in the later section.
In Fig. 5-4, we can find that the there is no obvious glitch of completion time
under buffer size 8 so we can say that the Round-Robin dose not make significant violated the timing constraint. From the Fig. 5-4 and Fig. 5-5, we can find there is a obvious glitch in buffer size 16. It is the policy setting: FF. This is because the buffer size 16 is the same with the memory controller delay so that transactions accessing memory controller block the other transactions. Therefore, other devices starved and bandwidth utilization collapsed.
Table 5-10 Timing constraint status with setting 1 of video phone scenario
1 2 4 8 16 FF Violated Violated Violated Violated Violated FT Violated Violated Violated Violated Met FR Violated Violated Violated Violated Violated
FL Violated Violated Violated Met Met
TF Violated Violated Violated Met Met
TT Violated Violated Violated Met Met
TR Violated Violated Violated Violated Met
TL Violated Violated Violated Met Met
RF Violated Violated Violated Met Met
RT Violated Violated Violated Met Met
RR Violated Violated Violated Violated Met
RL Violated Violated Violated Met Met
LF Violated Violated Violated Met Met
LT Violated Violated Violated Met Met
LR Violated Violated Violated Violated Met
LL Violated Violated Violated Met Met
Buffer size Policy
setting
Completion Time of Video Phone with Setting 1
Fig. 5-4 Completion time of video phone with setting 1
Bus Utilization of Video Phone with Setting 1
0%
Fig. 5-5 Bandwidth utilization of video phone setting 1
Table 5-11 shows the simulation results of setting 2. There are 19 configuration met the timing constraint. The results are very similar to simulation A.
In Fig. 5-6 and Fig. 5-7, there are obvious glitches in buffer 16. The reason is also the same with setting 1 but the blocking transactions changed to transactions accessing memory controller. The transactions accessing memory controller occupied 76.91% in video phone scenario so that transactions using data lock mode still make an obvious impact to performance.
Table 5-11 Timing constraint status with setting 2 of video phone scenario
1 2 4 8 16
FF Violated Violated Violated Met Violated
FT Violated Violated Violated Violated Violated FR Violated Violated Violated Violated Violated FL Violated Violated Violated Violated Violated TF Violated Violated Violated Violated Violated
TT Violated Violated Violated Met Met
TR Violated Violated Violated Met Met
TL Violated Violated Violated Met Met
RF Violated Violated Violated Violated Violated
RT Violated Violated Violated Met Met
RR Violated Violated Violated Met Met
RL Violated Violated Violated Met Met
LF Violated Violated Violated Violated Violated
LT Violated Violated Violated Met Met
LR Violated Violated Violated Met Met
LL Violated Violated Violated Met Met
Buffer size Policy
setting
Completion Time of Video Phone with Setting 2
Fig. 5-6 Completion time of video phone with setting 2
Bus Utilization of Video Phone with Setting 2
0%
Fig. 5-7 Bandwidth utilization of video phone setting 2
To observe the influence of task setting, we average the completion time and bandwidth utilization of each task setting. Observing the Fig. 5-8 and Fig. 5-9, setting 1 has significant performance than others. It can be explained that data lock mode is useful for devices with high latency and solves the condition of transactions concentrating on one device which makes interleave mode useless. Data lock mode is sure that it is suitable for memory controller and mass bandwidth required devices.
The setting 1 overcoming with other settings is more unobvious with the increasing of buffer size but data lock mode still has better performance. Although the setting 2 doe not have obvious improvement comparing to setting 1, the performance of the setting 2 is slightly better than all normal transactions when buffer size over 4. Therefore, data lock mode is still more useful than using interleave mode alone.
Average Completion Time of Different Task Setting
0 11 22 33 44 55 66
1 2 4 8 16
Wrapper buffer (entry)
Completion time (ms)
All normal Setting 1 Setting 2
Fig. 5-8 Average completion time of different task setting
Average Bandwidth Utilization of Different Task Setting
Fig. 5-9 Average bandwidth utilization of different task setting
Average Latency of Different Task Setting
0
Fig. 5-10 Average Latency of Different Task Setting
C. Data lock mode buffer size
Form simulation B, data lock mode does improve the performance of video
phone scenario but the improvement of performance is limited so we increase data lock mode buffer to observe the impact to the performance. Table 5-12 is the configuration of simulation C. We increase the data lock mode buffer from 1 to 2 and 4 and the tasks setting is the same with setting 1 of simulation B.
Table 5-12 Configurations of simulation for data lock mode buffer size
Data lock mode buffer size 2, 4
Task access setting Accessing memory using data lock mode , others normal mode
Table 5-13 shows the timing constraint status with data lock mode buffer 2.
There are 33 configurations met the timing constraint; moreover, some configurations of buffer size 4 met the timing constraint.
Fig. 5-11 and Fig. 5-12 show the result of video phone scenario with data lock mode buffer 2. We can note that the completion time and bandwidth utilization are both improved comparing to simulation B with data lock mode buffer 1.
Table 5-13 Timing constraint status with data lock mode buffer 2
1 2 4 8 16 FF Violated Violated Violated Violated Violated
FT Violated Violated Violated Met Met
FR Violated Violated Violated Met Met
FL Violated Violated Met Met Met
TF Violated Violated Violated Met Met
TT Violated Violated Violated Met Met
TR Violated Violated Violated Met Met
TL Violated Violated Met Met Met
RF Violated Violated Violated Met Met
RT Violated Violated Violated Met Met
Buffer size Policy
setting
RR Violated Violated Violated Met Met
RL Violated Violated Met Met Met
LF Violated Violated Violated Met Met
LT Violated Violated Violated Met Met
LR Violated Violated Violated Met Met
LL Violated Violated Violated Met Met
Completion Time of Video Phone with Data Lock Buffer 2
0
Fig. 5-11 Completion time of video phone with data lock buffer 2
Bus Utilization of Video Phone with Data Lock Buffer 2
Fig. 5-12 Bandwidth utilization of video phone with data lock buffer 2
There are 39 configurations met the timing constrain as shown in Table 5-14.
The configurations of buffer size 4 are all met timing constraint except most fixed priority. Note that, the buffer size and data lock buffer mode buffer are both 4. It means that the buffers in memory controller are capable of buffering all data lock mode transactions.
//so increase data lock mode buffer size improve performance
Table 5-14 Timing constraint status with data lock mode buffer 4
1 2 4 8 16 FF Violated Violated Violated Violated Violated
FT Violated Violated Violated Met Met
FR Violated Violated Violated Met Met
FL Violated Violated Met Met Met
TF Violated Violated Violated Met Met
Buffer size Policy
setting
TT Violated Violated Met Met Met
TR Violated Violated Met Met Met
TL Violated Violated Met Met Met
RF Violated Violated Violated Met Violated
RT Violated Violated Met Met Met
RR Violated Violated Met Met Met
RL Violated Violated Met Met Met
LF Violated Violated Violated Met Met
LT Violated Violated Met Met Met
LR Violated Violated Met Met Met
LL Violated Violated Met Met Met
Completion Time of Video Phone with Data Lock Buffer 4
0
Fig. 5-13 Completion time of video phone with data lock buffer 4
Bus Utilization of Video Phone withData Lock Buffer 4
Fig. 5-14 Bandwidth utilization of video phone with data lock buffer 4
D. Weighting tuning of arbitration policy
Although we have simulated the impact of arbitration policy, we can not obtain a precise setting of arbitration policy to get a better performance than others. Therefore, we tune the weight of arbitration policy to find a way to set weight properly. Table 5-15 shows the configurations of simulation. The arbitration policies of channels are the same which means that the 5 channels use the same arbitration policy. We use burst length 8 as a base number to tune the weight. Take masters of video phone scenario as an example; the ratios of bandwidth requirement of MPU: DSP: VE:
DMAC1: DMAC2 are 1:9:37:17:17. The MPU is the smallest devices so we give it a constant weight 4. The other devices take into consideration of weight tuning. The bandwidth requirement of DMAC1 and DMAC2 are almost the same so we treat them as one variable. Then, the variable of masters become to 3 which are x, y and z. Then
we give an equation :x+y+z=7, x>0,y>0,z>0. The solutions of the equation after multiply 8 which are our configurations of weight tuning. There are 15 configurations of master’s weight. The weight tuning of slave are the same as master. We take the first three of bandwidth requirement slaves as the variable. We also five an equation x+y+z=6, x>0,y>0,z>0. Therefore, the configurations of slave are 10 and the configurations which all we need to simulate are 15*10=150.
Table 5-15 Configuration of weight tuning
Wrapper buffer size 1, 2, 4, 8, 16
Arbitration policy TDMA, Round-Robin, Lottery
Task accessing setting Task accessing memory using data lock mode, others normal mode
Data lock mode buffer size 4
Hybrid threshold 1
Fig. 5-15 and Fig. 5-16 are the result of simulations. According to Fig. 5-15, when buffer size is exceeding 8, the configurations are all met the constraint. It reveals that when buffer size increased the impact of weight is slighter from the distribution of standard deviation in Fig. 5-16 but it does not show us how to tune the weight. Table 5-16 shows the configurations of buffer size 4 met timing constrain.
Analyzing the configurations met timing constraint; we can find that weight of configurations does not consist with bandwidth requirements. The distribution of configurations met timing constraint which equally concentrates on the average of bandwidth. It means that as long as the weight of arbitration policy is not an extreme then the performance is stable and good.
Average Bus Utilization
Buffer 1 62.360045% 62.326499% 62.339372%
Buffer 2 71.108044% 70.530258% 71.516573%
Buffer 4 73.715505% 76.280004% 76.491407%
Buffer 8 79.061074% 79.061078% 79.061059%
Buffer 16 79.061076% 79.061089% 79.061059%
TDMA Round-Robin Lottery
Fig. 5-15 Average bandwidth utilization of weight tuning
Standard Deviation of Bandwidth Utilization
0.0%
Buffer 1 0.255646% 0.185349% 0.208641%
Buffer 2 1.671525% 1.959662% 1.512652%
Buffer 4 1.741517% 1.759624% 1.812892%
Buffer 8 0.000047% 0.000045% 0.000055%
Buffer 16 0.000055% 0.000036% 0.000052%
TDMA Round-Robin Lottery
Fig. 5-16 Standard deviation of bandwidth utilization of weight tuning
Table 5-16 Met configurations of buffer size 4 in weigh tuning
Arbitration policy Met configuration TDMA 2 Round-Robin 48 Lottery 62