3.2 Proposed Traffic Generator Based Exploration Method
3.2.2 Off-line traffic generation
In this section, we will introduce our traffic generation method and the traffic file format we use.
Proposed traffic generation method
Figure 3-4 Off-line traffic generation of proposed TG-1
Here, we propose two ways to off-line generate traffic file. The first one is called “TG-1”, shows in Figure 3-4. We separate the simulation framework in two steps. First, we off-lined simulate cores’ ISS behavior. We use RVDS 3.0 ARM ISS [34] model as the core simulator, since we focus on ARM 11. In this step, we will run the target application source code on the ARM11 instruction accurate simulator to extract memory access patterns. These patterns include instruction accesses, read data accesses and write data accesses. These patterns will not change since the cores’ behaviors are always same for specific source codes. This means the off-line simulation only need once and needs to re-simulate in exploration. The second step is to translate the memory access pattern to our proposed traffic file format. The traffic file consists of information including access type, access address, write data, access packet size and timing information. Here, the timing information represents the execution pipeline
latency. Since the target ARM 11 model of Coware Model Library set all instructions execute one cycle. So, the default instruction latency is 1. We keep our traffic file generator to be configurable. Users could set different timing information value for different target processors’ ISS or even the cycle accurate core models. These values will be recorded in traffic file, which could runtime control TG‘s behavior. We will introduce traffic file format in next section.
Figure 3-5 Off-line traffic generation of proposed TG-2
The second approach of off-line simulation, as Figure 3-5, is called “TG-2”. We first run the target application source code on the instruction accurate simulator to extract memory access pattern. These patterns include instruction accesses, read data accesses and write data accesses. Then we will use an off-line cache model to simulate the cache behavior for different configuration. This cache model’s design space shows in section 3.1. The off-line traffic generator will produce traffic file for TG which has information including access type, access address, write data, access packet size and timing information. The traffic file format is the same as “TG-1”. Here, the timing information represents the relative latency between two transactions. The instruction latency is one cycle latency in ISS model, and we record this
pattern that we feed in cache model is equal to previous wok which means the ISS still need only one time simulation. The overhead of this framework is that we need to re-simulation cache model for different cache configuration. However, the cache model is implemented in C/C++ language and no other interconnection behavior. The off-line cache model simulation is fast. The cache model running on Pentiun 4 dual core 3.4 GHz PC only need for few seconds. We can easily ignore this overhead in our simulation framework. Also, the benefit of off-line cache model is the small traffic size. Cache could help to reduce transfer on system interconnect and reduce system simulation loading.
Proposed traffic format
Figure 3-6 Purposed traffic timing diagram
Figure 3-6 shows the timing diagram of our proposed traffic file format. First diagram represent the real situation of transaction behavior. Tx represent the transaction cycles count.
Here, we define the transaction behavior starts from BIU request trigger until TG receives response signal. τx represent the cycles count that no transaction happen. Core could be
executing or idling during these cycles. If we can model the BIU to behavior correct, the Tx
and T’x should be equal. However, the cores’ computation behavior is decides by the off-line simulator. If we do not model the latency of cores’ computation, the timing diagram would be like the middle one in Figure 3-6, traffic with no relative time. Since we have claim that our off-line model could record timing information in traffic file, our TGs’ timing diagram would be like third diagram in Figure 3-6, traffic with relative time. Ideally, the traffic with relative time should have same behavior of real situation as follow
Tx = T’’x
τx = τ’’x
The fist equation is decided by BIU’s accuracy, the second one is refer to our off-line simulator. Ideally, these equalizations are met because we have recorded time information in traffic file and the time value is equal to core’s ISS model.
Figure 3-7 Proposed traffic file format (a) Text format (b) Binary format
of traffic file. A traffic access pattern includes relative cycle count, access type, burst size, address and data. Figure 3-7 (b) shows the binary traffic file format. The off-line traffic file generator will generate binary format traffic file.
Table 3-2 Traffic format
Access Information Binary Size (Bytes) Parameter
Relative time (τ) 2 0~65535
Type 1
I – Instruction access R – Read data access W– Write data access Q– TG idle
Burst size 1 1, 2, 4, 8, 16, 32
Address 4 32-bit hexadecimal value
Data 4 32-bit hexadecimal value
Table 3-2 lists the information including parameters and encoding binary size. An access command requires 11 bytes. Relative time requires 2 bytes, and highest cycle count is 65535. The traffic generator would automatic insert idle command if the relative time exceeds this number. “Type” refers to different access command. “Q-type” is the idle command which makes TG stall for τ cycles. “Burst size” is the option for indicate the burst transaction size.
The transfer unit of the burst size is 4 bytes. The data is also 4-byte length. This means that when a command’s burst size is more than 1, the next (burst size -1) commands and itself would be packet to one burst transaction. For example, in Figure 3-7 (a) the third command is as flow:
This command is a write data access for 4 burst data, and the address is “0x21000”, data is
“0x0”. TG will packet the next three commands to one burst transfer. The burst data would be 128-bit length. TG’s BIU will automatically change the burst transaction type follow the bus protocol, e.g. “WRAP” or “INCR”.