Code Optimization on TI DSP Platform [15, 16]

Overview of the IEEE 802.16m Standard

4.3 Code Optimization on TI DSP Platform [15, 16]

In this section, we describe several methods that can accelerate our code and reduce the execution time on the C64x DSP. First, we use the following techniques to analyze the performance of specific code regions:

• One of the preliminary measures of code is the time it takes the code to run. Use the clock( ) and printf( ) functions in C/C++ to time and display the performance of

Figure 4.3: Code development flow for C6000 (from [15]).

• Use the profile mode of the stand-alone simulator. This can be done by executing load6x with the –g option. The profile results will be stored in a file with the .vaa extension. One may refer to the TMS320C6000 Optimizing Compiler Users Guide for more information.

• Enable the clock and use profile points and theRUNcommandin theCode Composer debugger to track the number of CPU clock cycles consumed by a particular section of code. One may use View Statistics to view the number of cycles consumed.

• The critical performance areas in a code are most often loops. An easiest way to optimize a loop is by extracting it into a separate file that can be rewritten, recompiled, and run with the stand-alone simulator (load6x).

We can also evaluate the performance results by running the code and looking at the instructions generated by the compiler.

4.3.1 Compiler Optimization Options

In this subsection, we introduce the compiler options that control the operation of the compiler. The C6000 compiler offers high-level language support by transforming a C/C++

code into more efficient assembly language source code. The compiler tools include a shell program (cl6x), which can be used use to compile, assembly optimize, assemble, and link programs in a single step. To compiler shell can be invoked by issuing the command

cl6x [options] [filenames] [-z [linker options] [object files]]

For a complete description of the C/C++ compiler and the options discussed in [15], see the TMS320C6000 Optimizing Compiler User Guide [14]. The major compiler options we use are -o3, -k, -pm -op2, -mh<n>, -mw, and -mi.

– -o3: highest level optimization, whose main features are:

∗ Performs software pipelining.

∗ Performs loop optimizations, and loop unrolling.

∗ Removes all functions that are never called.

∗ Reorders function declarations so that the attributes of called functions are known when the caller is optimized.

∗ Propagates arguments into function bodies when all calls pass the same value in the same argument position.

∗ Identifies file-level variable characteristics.

• -k: Keep the assembly file to analyze the compiler feedback.

• -pm -op2: In the CCS compiler option, -pm and -op2 are combined into one option.

– -pm: Gives the compiler global access to the whole program or module and allows it to be more aggressive in ruling out dependencies.

– -op2: Specifies that the module contains no functions or variables that are called or modified from outside the source code provided to the compiler. This improves variable analysis and allowed assumptions.

• -mh<n>: Allows speculative execution. The appropriate amount of padding, n, must be available in data memory to insure correct execution. This is normally not a problem but must be adhered to.

• -mw: Produce additional compiler feedback. This option has no performance or code size impact.

• -mi: Describes the interrupt threshold to the compiler. If the compiler knows that no interrupts will occur in the code, it can avoid enabling and disabling interrupts before and after software-pipelined loops for improvement in code size and performance. In

Figure 4.4: Software-pipelined loop (from [11]).

addition, there is potential for performance improvement where interrupt registers may be utilized in high register pressure loops.

4.3.2 Software Pipelining

Software pipelining is a technique used to schedule instructions from a loop so that multiple iterations of the loop execute in parallel. When we use the -o2 and -o3 compiler options, the compiler attempts to software pipeline the code with information that it gathers from the program. Fig. 4.4 illustrates a software-pipelined loop. The stages of the loop are represented by A, B, C, D, and E. In this figure, a maximum of five iterations of the loop can execute at one time. The shaded area represents the loop kernel. In the loop kernel, all five stages execute in parallel. The area above the kernel is known as the pipelined loop prolog, and the area below the kernel is known as the pipelined loop epilog.

Because loops present critical performance areas in a code, the TI document advises one to consider the following areas to improve the performance of the C code:

• Loop unrolling.

• Speculative execution.

4.3.3 Loop Unrolling

Another technique that improves performance is unrolling the loop; that is, expanding small loops so that each iteration of the loop appears in the code. This optimization increases the number of instructions available to execute in parallel. We can use loop unrolling when the operations in a single iteration do not use all of the resources of the C6000 architecture.

There are three ways loop unrolling can be performed:

• The compiler can automatically unroll the loop.

• The programmer can suggest that the compiler unroll the loop using the UNROLL pragma.

• The programmer can unroll the C/C++ code by self

In our work, we use the compiler to help us loop unrolling itself.

Chapter 5 Fixed-Point Implementation of Initial

在文檔中 IEEE 802.16m 初始下行同步之數位訊號處理器實現 (頁 70-76)