• 沒有找到結果。

Overview of the IEEE 802.16e Standard

4.2 TI’s Code Development Environment [17]

We now introduce the software environment used in our work and how to successfully develop an efficient DSP code as quickly. We will introduce some important and useful techniques to improve the program speed performance.

The Code Composer Studio, TI’s GUI code development tool, is the software platform that we use to develop and debug the projects. Some main features of it are listed below:

• Real-time analysis.

• Source code debugger common interface for both simulator and emulator targets.

– C/C++ assembly language support.

– Simple breakpoints.

– Advanced watch window.

– Symbol browser.

• DSP/BIOS support.

– Pre-emptive multi-threading.

– Interthread communication.

– Interupt handing.

• Chip Support Libraries (CSL) to simplify device configuration. CSL provides C-program functions to configure and control on-chip peripherals.

• DSP libraries for optimum DSP functionality. The DSP library includes many C-callable, assembly-optimized, general-purpose signal-processing and image/video process-ing routines. These routines are typically used in computationally intensive real-time

applications where optimal execution speed is critical. The TMS320C64x digital signal processor library (DSPLIB) provides some routines for:

– Adaptive filtering.

– Correlation.

– FFT.

– Filtering and convolution.

– Math.

– Matrix functions.

– Miscellaneous.

Some of these routines are used in our implementation, such as FFT and filtering. We introduce them in a later chapter.

4.2.1 Code Development Flow [18]

The recommended code development flow involves utilizing the C6000 code generation tools to aid in optimization rather than forcing the programmer to code by hand in assembly.

Hence the programmer may let the compiler do all the laborious work of instruction selection, parallelizing, pipelining, and register allocation. This simplifies the maintenance of the code, as everything resides in a C framework that is simple to maintain, support, and upgrade.

Fig. 4.3 illustrates the three phases in the code development flow. Because phase 3 is usually too detailed and time consuming, most of the time we will not go into phase 3 to write linear assembly code unless the software pipelining efficiency is too bad or the resource allocation is too unbalanced. The following techniques can be used to analyze the performance of specific code regions:

Figure 4.3: Code development flow of C6000 (from [18]).

• Use the clock( ) and printf( ) functions in C/C++ to time and display the performance of specific code regions. Use the stand-alone simulator (load6x) to run the code for this purpose.

• Use the profile mode of the stand-alone simulator. This can be done by compiling the code with the -mg option and executing load6x with the -g option. Then enable the clock and use profile points and the RUN command in the Code Composer debugger to track the number of CPU clock cycles consumed by a particular section of code.

Use “View Statistics” to view the number of cycles consumed.

Usually, we use the second technique above to analyze the C code performance. The feedback of the optimization result can be obtained with the -mw option. It shows some important results of the assembly optimizer for each code section. We take these results into consideration in improving the computational speed of certain loops in our program.

4.2.2 Compiler Optimization Options [18]

In this subsection, we introduce the compiler options that control the operation of the compiler. The CCS compiler offers high-level language support by transforming C/C++

code into more efficient assembly language source code. The compiler options can be used to optimize the code size or the executing performance.

The major compiler options we utilize are -o3,-k, -pm -op2, -mh<n>, -mw, and -mi.

• -on: The “n” denotes the level of optimization (0, 1, 2, and 3), which controls the type and degree of optimization.

– -o3: highest level optimization, main features are:

∗ Performs software pipelining.

∗ Performs loop optimizations, and loop unrolling.

∗ Removes all functions that are never called.

∗ Reorders function declarations so that the attributes of called functions are known when the caller is optimized.

∗ Propagates arguments into function bodies when all calls pass the same value in the same argument position.

∗ Identifies file-level variable characteristics.

• -k: Keep the assembly file to analyze the compiler feedback.

• -pm -op2: In the CCS compiler option, -pm and -op2 are combined into one option.

– -pm: Gives the compiler global access to the whole program or module and allows it to be more aggressive in ruling out dependencies.

– -op2: Specifies that the module contains no functions or variables that are called or modified from outside the source code provided to the compiler. This improves variable analysis and allowed assumptions.

• -mh<n>: Allows speculative execution. The appropriate amount of padding, n, must be available in data memory to insure correct execution. This is normally not a problem but must be adhered to.

• -mw: Produce additional compiler feedback. This option has no performance or code size impact.

• -mi: Describes the interrupt threshold to the compiler. If compiler knows that no interrupts will occur in the code, it can avoid enabling and disabling interrupts before and after software-pipelined loops for improvement in code size and performance. In

Figure 4.4: Software-pipelined loop (from [16]).

addition, there is potential for performance improvement where interrupt registers may be utilized in high register pressure loops.

4.2.3 Software Pipelining [19]

Software pipelining is a technique used to schedule instructions from a loop so that multiple iterations of the loop execute in parallel. This is the most important technique we use to speed up our system. The compiler always attempts to software-pipeline. Fig. 4.4 illustrates a software pipelined loop. The stages of the loop are represented by A, B, C, D, and E. In this figure, a maximum of five iterations of the loop can execute at one time. The shaded area represents the loop kernel. In the loop kernel, all five stages execute in parallel. The area above the kernel is known as the pipelined loop prolog, and the area below the kernel the pipelined loop epilog.

But under the conditions listed below, the compiler will not do software pipelining [18]:

• If a register value lives too long, the code is not software-pipelined.

condition registers, the loop is not software pipelined.

• A software-pipelined loop cannot contain function calls, including code that calls the run-time support routines.

• In a sequence of nested loops, the innermost loop is the only one that can be software-pipelined.

• If a loop contains conditional break, it is not software-pipelined.

Usually, we should maximize the number of loops that satisfy the requirements of software pipelining. Software pipelining is a very important technique for optimization; its importance cannot be overemphasized.

4.2.4 Intrinsics [18]

We did not use any intrinsic in our code, but we introduce the concept of this technique here. The C6000 compiler provides intrinsics, which are special functions that map directly to inlined C64x instructions, to optimize C/C++ code quickly. All assembly instructions that are not easily expressed in C/C++ code are supported as intrinsics. A table of TMS320C6000 C/C++ compiler intrinsics can be found in [18].

Chapter 5

相關文件