In this section we well show an example to point out problems of multicore ESL simulation environment. The example platform was built on a modern ESL development tool and simulated at higher TLM modeling environment. We will first introduce the ESL tool environment and then the example platform we built and then we will show the problems.
2.2.1 ESL development tools
Figure 2-7 Coware Platform Architect-ConvergenSC [17]
CoWare Inc. CoWare Platform Architect [17] is the SystemC-based graphical environment for capturing the entire product platform and the dash board for initiating the platform analysis functions. Platform Architect speeds the concurrent design of SoCs with embedded software, enabling users to rapidly create and validate SoC designs at the
transaction level in SystemC. Together with CoWare Model Designer and the CoWare Model Library, CoWare Platform Architect enables most comprehensive system-level design solution available for SystemC. Figure 2-7 shows the graphical environment of CoWare Platform Architect.
Properties of CoWare Platform Architect are listed below.
(A) Rapid capture and configuration of hierarchical SoC platforms (B) Superior architecture and performance analysis for SystemC
(C) Rapid exploration of complex interconnect and memory architectures (D) Advanced simulation, debug, and analysis for software development (E) Automated integration of RTL blocks into the TLM system
(F) Automated creation of highly reusable, user-defined SystemC peripheral components and unit tests
(G) Standards-based SystemC TLM modeling guidelines and examples using SCML (H) Comprehensive SystemC IP model availability with the CoWare Model Library With the property (H), Coware support model library includes a range of processor models from leading vendors such as ARM and MIPS, transaction-level bus models and RTL bus generators for common bus specifications such as AMBA, AXI, and OCP-IP, Denali MMAV memory models, and peripheral models such as the ARM PrimeCells. With the property (G), Platform Architect's native SystemC simulation environment is compatible with IEEE 1666 SystemC Language Reference Manual (LRM), Open SystemC Initiative (OSCI) transaction-level modeling (TLM) [22], and Open Core Protocol International Partnership (OCP-IP) TLM standards [23], providing support for all SystemC constructs for use by all members of a design team. Platform Architect also supports the OCSI SystemC Verification (SCV) 1.0 library extensions for transaction recording.
With the property (B) and (D), Coware Platform Architect support hardware and software profiling and analysis (see Figure 2-8). The analysis includes VCD trace dump, bus statistic analysis such as bus utilization and access latency, etc. Besides, the processors models support debuggers, for example, GDB. Designers can build up a complete SoC simulation environment composed with reused IP or user defined components in SystemC model. With the benefit of SystemC language, the environment could be simulated at different abstraction level for different design stage. In conclusion, CoWare Platform Architect supplies an ESL development environment for design exploration, verification and performance analysis. The ESL tool could bring better and faster SoC-based convergent products to market.
Figure 2-8 Coware Platform Architect analysis GUI [17]
2.2.2 ARM-based SoC platform on ESL
We build up a 4 ARM11 SoC platform on Coware environment. The platform architecture is follow by the framework in last section we introduced, MPARM. We use the Coware Model Library’s processor model: ARM1176-JZS AXI-Model. The ARM model‘s
computational abstraction support Cycle Accurate(CA) or Instruction Accurate(IA) level modeling, and ARM’s interconnection support Untimed, TLM cycle accurate and pin-accurate model. The system platform shows in Figure 1-1, we configure the ARM model as IA model and turn on the cache simulation model (which is embedded in the ARM model). The behavior of ARM IA model is in single access topology and one cycle latency for all instruction execution. The cache model has no buffers are modeled due to the instruction-accurate nature, no critical-word-first cache line loading scheme is used, and all memory accesses, line fills, and line evictions execute in a blocking fashion. The Bus Interface Unit (BIU) of ARM cores are configured at TLM Bus Cycle Accurate (TLM-BCA) level. An ARM core has 4 64-bit AXI ports, I-AXI, D-AXI, P-AXI and D-MAAXI, respectively for instruction, data, peripheral and DMA accesses.
Full Crossbar (AXI)
Figure 2-9 4-ARM platform architecture
The interconnection is configured as a full crossbar with AMBA AXI protocol. One cycle latency for memory access and has one AXI port for every memory component. We use four 512х512 integer JPEG encoding as benchmarks and run on every ARM core independently.
Input file streams and output file streams are all allocated in shared memory. Instruction (I)
policy.
Table 2-1 shows the simulation result. We setup one to four ARM cores platform to run benchmarks independently. The execution cycle count shows no big change due to the crossbar hierarchy. However, the simulation time is much different when core number increase. The simulation speed is up to 380 k cycle/sec in 1 core platform, however, drops to 96 k cycle/sec in 4 core platform, about four times slow down. This result is same as MPARM we introduced in section 2.1. More components in a ESL simulation environment, the simulation speed drops down more quickly. As more and more processor cores would contain in SoC, the simulation speed would be a problem.
Table 2-1 Simulation result
Platform Execution Cycle count (k cycles)
The simulation takes 11 minutes to run JPEG benchmark. It is “OK” for one time simulation. However, designers will use ESL simulation environment for architecture exploration. During the design space exploration, simulation will be repeat and repeat. There are two run-time behaviors very difficult to model at a high level: cache behavior and network contention. Precise simulation of these two behaviors can only be done with a low-level description of the components. This means days (sometimes months) of simulation for fully search on design space.
We take an example of cache configuration. Table 2-2 shows the design space example of I and D cache. The total design choices would be (30)2=900 configurations of level 1
instruction and data cache of a processor. If we consider all processors choose the same configuration in a 4-ARM platform. It would take about 155 hours for fully search on cache.
If all cores have different design choices, this means (900)4 ≈ 656,100,000,000 choices for exhausted search. The design space here does not include the interconnect network yet. In a conclusion, design exploration would take thousands (or even more) of hours for simulation.
The main problem of multicore SoC ESL simulation environment is the slow simulation speed.
Table 2-2 Design space of cache
Design Target Design Choices Cache size 4, 8, 16, 32 ,64 kBs Cache write mode Write-back, Write-through
Cache replacemet policy
Pseudo-random Round-Robin Last-recently-use Total Design Space 30