• 沒有找到結果。

Simulation Setup

在文檔中 系統資料頻寬之研究 (頁 58-63)

Chapter 4 AXI Shared-link Bus

4.3. Simulation Setup

To properly evaluate the performance of the proposed transfer modes and arbitration framework on a shared-link AXI bus, we built a high-level model of a simplified multi-core platform system using SystemC [45]. The simulation accuracy of this model depends on modeling methodology, platform architecture authenticity, and application traffics accuracy.

The bus and components in the platform were modeled using transaction-level and behavior-level modeling method respectively. Transaction-level modeling uses a transaction instead of a cycle as the basic simulation unit. Since a transaction takes a fixed number of cycles to complete in each channel, transaction-level modeling ensures bus cycle accuracy in our simulations. More detail on transaction-level modeling can be found in [37]. To pursue platform architecture authenticity, the multi-core platform model was built based on a real multi-core platform [46]. The real platform has been verified with portable media player and smartphone applications. This ensures the simulation result from our platform model to be practical. The application traffics were derived based on the behavior and algorithm of the platform components to ensure traffics accuracy. The details of the platform architecture and bus traffics are provided in the following subsections.

4.3.1. Multimedia Platform Architecture

Fig. 16 illustrates the target platform from the system bus point of view. Note that when the platform is used for AHB simulation, the bus interconnect is replaced with a 5-layer AHB-lite interconnect with each master port having one dedicated AHB-lite bus.

Since we only focus on the transaction behavior on the bus, the devices are modeled to only exhibit transaction behavior and pattern. However, the CPU does generate transactions related to interrupt service routines (ISR) upon receiving an interrupt request (IRQ). In addition, the DMA controller is also programmed to carry out different data moving tasks to mimic the behavior of its real counterpart. Including such more detailed behavior enables us to include the inter task dependency between devices. Note that the memory controller has two slave ports to allow more transactions to be seen by the scheduler of the memory controller. Among all the devices, the memory controller is the only one with access latency ranging from 0 to 16 cycles.

The AXI bus is clocked at 40MHz with both the address and data widths being 32-bit wide. This would yield an ideal total bandwidth of 320 MB/sec with the read and write bandwidths being 160 MB/sec each.

4.3.2. Video Phone Scenario

We have selected the video phone application for analysis because it covers a variety of devices and traffics that are common in most multimedia consumer electronic products. The bandwidth requirement of the video phone application is heavier than other applications such as portable media player, video recording, MP3 player, and regular phone service. This heavy bandwidth requirement also makes the video phone application a perfect application to test the performance limit of a bus.

Fig. 16. Block diagram of the target platform using (a)AXI, (b) AHB-lite

In the video phone application, the system must deliver both audio and video communication at the same time. The system supports 44.1 KHz stereo audio capture/output and audio compression/decompression. As to video, the system provides VGA sized video capture, compression, decompression, and display with a target frame rate of 30 FPS. Table 3 lists the task description, bandwidth requirement, and task completion time constraint of each master device in the video phone application. Although more devices may be included in a system, the bus traffic is usually dominated by the master devices listed in Table 3. The total bus bandwidth requirement is 247.8 MB/sec, which occupies 77.5% of the 320 MB/sec available bus bandwidth. If the bus can achieve a bandwidth utilization higher than 77.5%, all the system tasks are more likely to complete within the specified timing constraints.

4.3.3. Evaluation Metrics

The definition and physical meaning of the evaluation metrics are explained as follows.

Table 3 Port task description and bandwidth requirement

Master

Requirement 1.640 1.961 3.601

DSP Video decode 14.836 42.473 57.309 Video

Encoder Video encode 59.927 14.255 74.182 Video in to MEM 27.927 27.927 55.855 Audio in to

MEM 0.176 0.176 0.353

3G

communication 0.132 0.132 0.265 DMAC0

Total Bandwidth

Requirement 28.236 28.236 56.472 MEM to video

Requirement 28.104 28.104 56.208 System Total Bandwidth

Requirement 132.743 115.028 247.771

A. Bandwidth Utilization (BWU)

The bandwidth utilization (BWU) is defined as the percentage of available ideal bus bandwidth being used to actually transfer data, i.e.

%

where Bused and Bideal are the actually used bandwidth and available ideal bandwidth respectively. A higher BWU implies more data can be transferred within a period of time. It also implies shorter effective transaction latency from the system’s point of view.

B. Transaction Latency

The transaction latency we used is defined as the average of read and write transaction latencies. The latency of a read or write transaction is measured from the time a transaction request is sent from a master till the time the read data or write response is returned to the master. The average transaction latency, denoted as TL, can be defined as

write

where ∑TLread and ∑TLwrite are the sums of all read and write transaction latencies respectively. Nread and Nwrite are the total number of read and write transactions respectively.

In contrast to the bandwidth, which increases as more data can be transferred, the transaction latency may remain the same even if the bandwidth utilization has been increased.

C. System Task Completion Time

The system task completion time is defined as the time when all tasks in the video phone application have been completed. We believe it is crucial to minimize the system task completion time so that the task-level timing constraint can be met. In the video phone application, all tasks must be done within 33 ms, otherwise we say the system violates the real-time constraint.

在文檔中 系統資料頻寬之研究 (頁 58-63)