Chapter 4 Result and Analysis
4.1 Profiling Result
For each simulation case, we first modify the application SW C++ program to make an ARM executable image file with CoWare ARM926T Bootcode and scatter load.
Second, we modify HW SystemC TLM models (if needed) to create a virtual platform using Platform Creator. Finally, we build and run simulation with ARM Symbolic Debugger (ASD), load an executable image file, enable analysis function from CoWare profiling utilities, and then wait for result [15].
In addition to the analysis reported by CoWare profiling utilities, we also gather some information by ourselves which had been introduced in chapter 3. But the information is displayed on ASD directly and will affect the performance analysis from CoWare profiling utilities. Thus we make two kinds of executable images for each simulation case, one is adding our profiling code in the functions to print out the
simulation information without analysis function; the other is without adding the code but just the analysis function from CoWare profiling utilities.
Our simulation flow for each case is listed below: first, combined SW golden functions with our modified functions to make sure the modification is functionally correct; second, remove SW golden functions and add our own profiling code to run again; third, remove our own profiling code and run simulation with the enabled analysis function form CoWare profiling utilities.
We simulate three supported modulation modes separately for each simulation case and combine the results together. The profiling results include function execution time, memory accesses, bus transactions, etc….
Cache size is one kind of factors that we are interested in. We can adjust both instruction and data cache sizes on the ARM926 PSP by Parameter Editor in Platform Creator. The unit of the size is in kilobytes. Since setting cache size to zero is illegal, we must make another executable image file to disable cache module for the case of without cache. For each simulation case, we have three different results with different cache sizes, which are zero cache, 4k bytes for both instruction and data cache, and 32k bytes for both instruction and data cache.
Table 4.1 is the function profiling result for simulation Case 1. We list five main functions in the column of function name, where the indent actions at the beginning of each function name shows the relationship between those functions. The function TX contains the other four functions, and the function OFDM Modulator contains the function IFFT. In other words, the three functions Modulation, STBC Encoder, and OFDM Modulator are called within the function TX, and the function IFFT is called within the function OFDM Modulator. The column of Total execution time in nanosecond is separated into three columns corresponding to three different cache sizes, and the total instruction count for each function is listed, too.
Table 4.1: Function profiling result in Case 1 Modulation
mode
Function name
Total execution time (Cache size) Instruction counts 0k 4k 32k
QPSK
TX 91013600 44740300 44067700 3328995
Modulation 242736 60016 58552 5376
STBC Encoder 3474940 1084860 985424 59417 OFDM Modulator 79216600 40976100 40527900 3099137 IFFT 74916100 39559600 39269800 3013449
16QAM
TX 91459200 44870500 44198600 3364008
Modulation 272984 85808 82688 8484
STBC Encoder 3474940 1084860 987000 59417 OFDM Modulator 79580700 41054300 40606800 3129506 IFFT 75280300 39637100 39348700 3043818
64QAM
TX 91501400 44996400 44293800 3376156
Modulation 318400 110976 106120 6912
STBC Encoder 3474940 1084730 988480 59417 OFDM Modulator 79502700 41103800 40626300 3140154 IFFT 75202300 39687300 39368100 3054466
Table 4.2: Memory accesses of functions in Case 1 (cache disabled) Modulation mode Function name ROM read RAM read RAM write
QPSK
TX 5926775 382603 343745
Modulation 28056 3456 1920
STBC Encoder 164303 18948 24579
OFDM Modulator 5347218 332080 277648
IFFT 5176510 315892 256836
16QAM
TX 5928111 383371 343745
Modulation 30514 4224 1920
STBC Encoder 164303 18948 24579
OFDM Modulator 5346928 332080 277648
IFFT 5176220 315892 256836
64QAM
TX 5949586 384139 343745
Modulation 43256 4992 1920
STBC Encoder 164303 18948 24579
OFDM Modulator 5352628 332080 277648
Table 4.3: Memory accesses of functions in Case 1 (4k caches) Modulation mode Function name ROM read RAM read RAM write
QPSK
TX 17125 96644 343745
Modulation 0 317 1579
STBC Encoder 237 14733 24578
OFDM Modulator 15884 53276 277644
IFFT 15260 34516 256836
16QAM
TX 18291 97170 343745
Modulation 0 0 0
STBC Encoder 237 14733 24578
OFDM Modulator 16900 52636 277644
IFFT 16252 33876 256836
64QAM
TX 18100 96571 343745
Modulation 0 0 0
STBC Encoder 237 14733 24578
OFDM Modulator 16676 51484 277644
IFFT 16028 32724 256836
Table 4.4: Memory accesses of functions in Case 1 (32k caches) Modulation mode Function name ROM read RAM read RAM write
QPSK
TX 2303 28358 343745
Modulation 0 0 0
STBC Encoder 358 8812 33365
OFDM Modulator 1844 12412 234647
IFFT 117 2285 73270
16QAM
TX 2338 29225 343745
Modulation 0 0 0
STBC Encoder 374 9589 33365
OFDM Modulator 0 0 0
IFFT 0 0 0
64QAM
TX 2340 29915 343745
Modulation 0 0 0
STBC Encoder 149 4565 24578
OFDM Modulator 1876 12756 298846
IFFT 1876 12756 298846
Table 4.2 is the memory access count for each function with disabled caches in simulation Case 1, and Table 4.3 and Table 4.4 show the same information in the same simulation case but running with 4k and 32k for both instruction and data caches. These tables are similar to Table 4.1 but the listed values are changed to ROM read access counts, RAM read access counts, and RAM write access counts. These access counts were gathered by additional profiling variables which accumulate the difference in total memory access counts from the counters in the memory model between the begin position and the end position during each function call.
Table 4.5 shows total memory read/write access counts in simulation Case 1 with different cache sizes. This information is gathered by running the other executable images without additional profiling variables and code in previous three tables.
Table 4.5: Total memory accesses in Case 1
Modulation mode QPSK 16QAM 64QAM
Cache disabled
ROM access counts
Read 6607357 6897407 7179063
Write 0 0 0
Total 6607357 6897407 7179063 RAM access counts
Table 4.6 is the bus transaction information in simulation Case 1 gathered by CoWare profiling utilities.
Table 4.6: Bus transaction information in Case 1 Modulation
mode Information Type Master Cache size
0k 4k 32k
QPSK
Transaction Counts IAHB 6609320 36936 24912
DAHB 912442 489451 421683
Transaction Throughputs (kB/s)
IAHB 258975 2958.45 2024.27 DAHB 35658.1 39191.8 34252.8 Bus Utilization (%) IAHB 53.038 0.605891 0.41457 DAHB 7.3221 8.028886 7.01739 Master Wait Total
(%)
IAHB 1.58846 0.0116303 0.00737214
DAHB 15.626 3.11136 3.08761
AVG. Waiting Masters 0.172145 0.0312299 0.0309498
16QAM
Transaction Counts IAHB 6891390 38109 25053
DAHB 946646 503391 434735
Transaction Throughputs (kB/s)
IAHB 258158 2900.02 1932.16
DAHB 35350.5 38296 33516.7
Bus Utilization (%) IAHB 52.8708 0.593925 0.395706 DAHB 7.26267 7.84529 6.86653 Master Wait Total
(%)
IAHB 1.5213 0.0115796 0.0069181 DAHB 15.4394 3.03037 3.01443 AVG. Waiting Masters 0.169607 0.0304195 0.0302134
64QAM
Transaction Counts IAHB 7165550 37606 24982
DAHB 980599 517299 447843
Transaction Throughputs (kB/s)
IAHB 258087 2723.58 1834.01 DAHB 35190.8 37454.4 32866.9 Bus Utilization (%) IAHB 52.8562 0.55779 0.375605 DAHB 7.23332 7.67282 6.73333 Master Wait Total
(%)
IAHB 1.51992 0.00998225 0.00661541
DAHB 15.3297 2.9779 2.95016
AVG. Waiting Masters 0.168497 0.0298788 0.0295677
The following tables show the simulation results for other simulation cases.
Table 4.7: Function profiling result in Case 2 Modulation
mode
Function name
Total execution time (Cache size) Instruction counts 0k 4k 32k
QPSK
TX 15316600 5629390 5351670 305841
Modulation 202832 166072 166072 3840
STBC Encoder 1126790 592280 582760 20490 OFDM Modulator 5908480 2239900 2066220 116444
IFFT 1608190 828416 808200 30760
16QAM
TX 15365800 5655700 5377680 307377
Modulation 202832 166024 166024 3840
STBC Encoder 1126790 592328 582792 20490 OFDM Modulator 5908480 2240100 2066730 116444
IFFT 1608190 828224 808208 30760
64QAM
TX 15425800 5682720 5406250 310449
Modulation 202832 166024 166024 3840
STBC Encoder 1126790 592328 582688 20490 OFDM Modulator 5908480 2240090 2067290 116444
IFFT 1608190 828224 808232 30760
Table 4.8: Memory accesses of functions in Case 2 (cache disabled) Modulation mode Function name ROM read RAM read RAM write
QPSK
TX 739811 54439 72565
Modulation 21889 1920 1920
STBC Encoder 60930 3075 5123
OFDM Modulator 271008 21324 25928
IFFT 100364 5132 5132
16QAM
TX 743392 55207 72565
Modulation 23041 2688 1920
STBC Encoder 60930 3075 5123
OFDM Modulator 271008 21324 25928
IFFT 100364 5132 5132
64QAM
TX 747559 55975 72565
Modulation 28033 3456 1920
STBC Encoder 60930 3075 5123
OFDM Modulator 271008 21324 25928
Table 4.9: Memory accesses of functions in Case 2 (4k caches) Modulation mode Function name ROM read RAM read RAM write
QPSK
TX 1086 52524 72565
Modulation 87 685 3883
STBC Encoder 77 2029 5122
OFDM Modulator 324 21916 25924
IFFT 148 3412 5128
16QAM
TX 966 53316 72565
Modulation 87 701 1939
STBC Encoder 77 2029 5122
OFDM Modulator 260 21916 25924
IFFT 84 3380 5128
64QAM
TX 992 54118 72565
Modulation 121 2279 4363
STBC Encoder 77 2029 5122
OFDM Modulator 260 21916 25924
IFFT 84 3380 5128
Table 4.10: Memory accesses of functions in Case 2 (32k caches) Modulation mode Function name ROM read RAM read RAM write
QPSK
TX 662 24908 72565
Modulation 0 0 0
STBC Encoder 0 0 0
OFDM Modulator 0 0 0
IFFT 0 0 0
16QAM
TX 651 25713 72565
Modulation 0 0 0
STBC Encoder 0 0 0
OFDM Modulator 545 14831 46525
IFFT 0 0 0
64QAM
TX 662 26436 72565
Modulation 0 0 0
STBC Encoder 0 0 0
OFDM Modulator 0 0 0
IFFT 0 0 0
Table 4.11: Total memory and HW models accesses in Case 2
Modulation mode QPSK 16QAM 64QAM
Cache disabled
ROM access counts
Read 1416541 1691722 1968445
Write 0 0 0
Total 1416541 1691722 1968445 RAM access counts Modulation HW access counts
Read 1536 1536 1536
Write 768 768 768
Total 2304 2304 2304
STBC Encoder HW access counts
Read 4096 4096 4096
Write 1024 1024 1024
Total 5120 5120 5120
IFFT HW access counts
Read 4096 4096 4096
Write 2052 2052 2052
Total 6148 6148 6148
There is a little difference between Table 4.11 and Table 4.5, because Case 2 of Table 4.11 uses three individual HW accelerators whose access counts are also listed.
Table 4.12: Bus transaction information in Case 2 Modulation
mode Information Type Master Cache size
0k 4k 32k
QPSK
Transaction Counts IAHB 1498210 23802 23562
DAHB 250097 192404 163484
Transaction Throughputs (kB/s)
IAHB 243941 9638.86 9852.26 DAHB 40584.1 77857.7 68299.3 Bus Utilization (%) IAHB 49.9592 1.97404 2.01774
DAHB 8.33972 15.9572 14
Master Wait Total (%)
IAHB 1.3294 0.0352477 0.0329697
DAHB 17.447 5.73128 5.86466
AVG. Waiting Masters 0.187764 0.0576653 0.0589763
16QAM
Transaction Counts IAHB 1765700 23691 23315
DAHB 283917 205496 176568
Transaction Throughputs (kB/s)
IAHB 244796 7643.2 7710.54
DAHB 39165.5 66250.7 58345.5 Bus Utilization (%) IAHB 50.1342 1.56533 1.57912 DAHB 8.06135 13.5777 11.9589 Master Wait Total
(%)
IAHB 1.19672 0.028213 0.0271596
DAHB 16.6372 4.93952 5.0181
AVG. Waiting Masters 0.178339 0.0496774 0.0504526
64QAM
Transaction Counts IAHB 2034920 24077 23677
DAHB 317763 218604 189700
Transaction Throughputs (kB/s)
IAHB 245490 6440.26 6467.13
DAHB 38093.6 58435 51775.3
Bus Utilization (%) IAHB 50.2763 1.31897 1.32447 DAHB 7.85089 11.9754 10.6116 Master Wait Total
(%)
IAHB 1.10768 0.0232272 0.0221519 DAHB 16.0548 4.39389 4.44984 AVG. Waiting Masters 0.171625 0.0441711 0.0447199 Table 4.13: Function profiling result in Case 3
Modulation mode
Function name
Total execution time (Cache size) Instruction counts 0k 4k 32k
QPSK TX 5176 3992 3992 123
16QAM TX 10168 8144 7856 243
64QAM TX 15160 12008 11720 363
Table 4.14: Memory accesses of functions in Case 3
Modulation mode Cache size Function name ROM RAM read read write
Table 4.15: Total memory and HW model accesses in Case 3
Modulation mode QPSK 16QAM 64QAM
Cache disabled
ROM access counts
Read 283810 559794 839382
Write 0 0 0
Total 283810 559794 839382 RAM access counts TX HW access counts
Read 0 0 0
Write 24 48 72
Table 4.16: Bus transaction information in Case 3 Modulation
mode Information Type Master Cache size
0k 4k 32k
QPSK
Transaction Counts IAHB 300599 5771 5731
DAHB 38973 15789 15755
Transaction Throughputs (kB/s)
IAHB 248057 8296.62 8244.11 DAHB 31485.1 22524.2 22491.1 Bus Utilization (%) IAHB 50.8023 1.69915 1.68839 DAHB 6.58656 4.64873 4.64154 Master Wait Total
(%)
IAHB 0.491968 0.0209044 0.0203279 DAHB 13.0944 2.50588 2.50387 AVG. Waiting Masters 0.135683 0.0252679 0.0252419
16QAM
Transaction Counts IAHB 568312 5908 5828
DAHB 72865 29201 28481
Transaction Throughputs (kB/s)
IAHB 249642 4547.9 4493.76
DAHB 31394.6 22385 21867
Bus Utilization (%) IAHB 51.1267 0.93141 0.920323
DAHB 6.5551 4.60361 4.49755
Master Wait Total (%)
IAHB 0.559385 0.0110357 0.0108961
DAHB 12.6499 2.2445 2.23385
AVG. Waiting Masters 0.132093 0.0225553 0.0224475
64QAM
Transaction Counts IAHB 840627 6105 6033
DAHB 106757 41333 41253
Transaction Throughputs (kB/s)
IAHB 249641 3109.74 3073.59 DAHB 31118.3 20992.1 20954.9 Bus Utilization (%) IAHB 51.1265 0.636874 0.62947
DAHB 6.4929 4.31186 4.30425
Master Wait Total (%)
IAHB 0.539773 0.00865847 0.00834703 DAHB 12.6321 2.05208 2.05107 AVG. Waiting Masters 0.131719 0.0206074 0.0205942
We combined all TX function blocks to one HW accelerator for simulation Case 3, so that only TX function remains in the application SW program.
Table 4.17: Function profiling result in Case 4 Modulation
mode
Function name
Total execution time (Cache size) Instruction counts 0k 4k 32k
QPSK TX 392512 175496 175400 8501
Modulation 242152 71400 71360 5376
16QAM TX 379472 176224 176120 8889
Modulation 226616 70928 70888 5716
64QAM TX 477608 217360 216808 10157
Modulation 323712 112368 111952 6912
Table 4.18: Memory accesses of functions in Case 4
Modulation mode Cache size Function name ROM RAM read read write
QPSK
0k TX 24560 1568 7
Modulation 24507 1562 3
4k TX 0 0 0
Modulation 22265 1586 3
4k TX 0 0 0
Modulation 34623 1610 3
4k TX 0 0 0
Modulation 0 0 0
32k TX 0 0 0
Modulation 0 0 0
Table 4.17 and Table 4.18 have one more function than Table 4.13 and Table 4.14, because simulation Case 4 retains the function Modulation from application SW code.
Table 4.19: Total memory and HW model accesses in Case 4
Modulation mode QPSK 16QAM 64QAM
Cache disabled
ROM access counts
Read 314888 589902 885465
Write 0 0 0
Total 314888 589902 885465 RAM access counts TX HW access counts
Read 0 0 0
Write 1536 1536 1536
Total 1536 1536 1536
The twenty tables in this section show the profiling result of simulations for the four simulation cases that we had defined in chapter 3, and three supported modulation modes run with three different configurations in cache size of ISS for each case. All of the results in these tables are simulated in a 125MHz system clock frequency; that means the system clock period is 8 ns. All TLM models for HW accelerators are seen as ideal HW without delay although we had defined delay parameter and ready signal in model template, and the bus transaction duration is also 8 ns.
Table 4.20: Bus transaction information in Case 4 Modulation
mode Information Type Master Cache size
0k 4k 32k
QPSK
Transaction Counts IAHB 330283 15784 15536
DAHB 43585 18152 18136
Transaction Throughputs (kB/s)
IAHB 246996 20644.5 20331.4 DAHB 31969.4 23559.2 23551.3 Bus Utilization (%) IAHB 50.585 4.22799 4.16386
DAHB 6.6753 4.86229 4.8607
Master Wait Total (%)
IAHB 0.609714 0.0195542 0.0179569 DAHB 13.6969 2.84366 2.84363 AVG. Waiting Masters 0.143066 0.0286321 0.0286158
16QAM
Transaction Counts IAHB 597722 15729 15656
DAHB 77453 31532 30860
Transaction Throughputs (kB/s)
IAHB 247778 11467.2 11430.9 DAHB 31522.1 22886.6 22429.9 Bus Utilization (%) IAHB 50.7451 2.34848 2.3412 DAHB 6.57555 4.70801 4.61451 Master Wait Total
(%)
IAHB 0.600564 0.0116461 0.0113643 DAHB 13.1682 2.40387 2.39458 AVG. Waiting Masters 0.137687 0.0241552 0.0240595
64QAM
Transaction Counts IAHB 885043 15990 15926
DAHB 111323 43697 43633
Transaction Throughputs (kB/s)
IAHB 250197 7815.68 7786.46 DAHB 30908.6 21290.3 21264.6 Bus Utilization (%) IAHB 51.2404 1.60065 1.59467 DAHB 6.44514 4.37421 4.36896 Master Wait Total
(%)
IAHB 0.551284 0.00830857 0.00821064 DAHB 12.9417 2.17805 2.17792 AVG. Waiting Masters 0.13493 0.0218636 0.0218613