• 沒有找到結果。

In the following sections, we introduce the software environments we utilize in our work and how to successfully develop an efficient DSP code as quickly as possible. Then, we introduce some important and useful techniques to improve the program speed perfor-mance. The optimization of each block in our work is discussed after introduction of the software environment.

The Code Composer Studio, TI’s GUI development tool, is the software platform that we use to develop and debug the projects. Some main features of it are listed below:

• Real-time analysis.

• Source code debugger common interface for both simulator and emulator targets.

– C/C++ assembly language support.

• Chip Support Libraries (CSL) to simplify device configuration. CSL provides C-program functions to configure and control on-chip peripherals.

• DSP libraries for optimum DSP functionality. The DSP library includes many C-callable, assembly-optimized, general-purpose signal-processing and image/video processing routines. These routines are typically used in computationally intensive

real-time applications where optimal execution speed is critical. The TMS320C64x digital signal processor library (DSPLIB) provides some routines shown below:

– Adaptive filtering.

In our project, some routines are used in the implementation, such as FFT and filtering.

We introduce them in later sections.

4.3.1 Code Development Flow [19]

The recommended code development flow involves utilizing the C6000 code generation tools to aid in optimization rather than forcing the programmer to code by hand in as-sembly. These advantages allow the compiler to do all the laborious work of instruction selection, parallelizing, pipelining, and register allocation. These features simplify the maintenance of the code, as everything resides in a C framework that is simple to main-tain, support, and upgrade. Figure 4.7 illustrates the three phases in the code development flow. Because phase 3 is usually too detailed and time consuming, most of the time we will not go into phase 3 to write linear assembly code unless the software pipelining effi-ciency is too bad or the resource allocation is too unbalanced. The following techniques can be used to analyze the performance of our specific code regions:

• Use the clock( ) and printf( ) functions in C/C++ to time and display the perfor-mance of specific code regions. Use the stand-alone simulator (load6x) to run the code for this purpose.

Fig. 4.7: Code development flow of C6000 (from [19]).

• Use the profile mode of the stand-alone simulator. This can be done by compiling our code with the -mg option and executing load6x with the -g option. Then enable the clock and use profile points and the RUN command in the Code Composer debugger to track the number of CPU clock cycles consumed by a particular section of code. Use “View Statistics” to view the number of cycles consumed.

Usually, we use the second technique above to analyze our C code performance. The feedback of the optimization result can be obtained with the -mw option. It shows some important results of the assembly optimizer of the particular loop. In our analysis, this shall be taken into consideration for improving the computational speed of certain loops in our program.

4.3.2 Compiler Optimization Options [19]

In this subsection, we introduce the compiler options that control the operation of the compiler. CCS compiler offers high-level language support by transforming C/C++ code into more efficient assembly language source code. The compiler options can be used to optimize our code size or the executing performance.

The major compiler options we utilize are -o3,-k, -pm -op2, -mh<n>, -mw, and -mi.

• -on: The “n” denotes the level of optimization (0, 1, 2, and 3), which controls the type and degree of optimization.

– -o3: highest level optimization, main features are:

∗ Performs software pipelining.

∗ Performs loop optimizations, and loop unrolling.

∗ Removes all functions that are never called.

∗ Reorders function declarations so that the attributes of called functions are known when the caller is optimized.

∗ Propagates arguments into function bodies when all calls pass the same value in the same argument position.

∗ Identifies file-level variable characteristics.

• -k: Keep the assembly file to analyze the compiler feedback.

• -pm -op2: In the CCS compiler option, -pm and -op2 are combined into one option.

– -pm: Gives the compiler global access to the whole program or module and allows it to be more aggressive in ruling out dependencies.

– -op2: Specifies that the module contains no functions or variables that are called or modified from outside the source code provided to the compiler.

This improves variable analysis and allowed assumptions.

• -mh<n>: Allows speculative execution. The appropriate amount of padding, n, must be available in data memory to insure correct execution. This is normally not a problem but must be adhered to.

• -mw: Produce additional compiler feedback. This option has no performance or code size impact.

• -mi: Describes the interrupt threshold to the compiler. If we know that no interrupts will occur in our code, the compiler can avoid enabling and disabling interrupts before and after software pipelined loops for a code size and performance improve-ment. In addition, there is potential for performance improvement where interrupt registers may be utilized in high register pressure loops.

4.3.3 Software Pipelining [22]

Software pipelining is a technique used to schedule instructions from a loop so that mul-tiple iterations of the loop execute in parallel. The compiler always attempts to software

Fig. 4.8: Software pipeline loop (from [18]).

pipeline. Figure 4.8 illustrates a software pipelined loop. The stages of the loop are rep-resented by A, B, C, D, and E. In this figure, a maximum of five iterations of the loop can execute at one time. The shaded area represents the loop kernel. In the loop kernel, all five stages execute in parallel. The area above the kernel is known as the pipelined loop prolog, and the area below the kernel is known as the pipelined loop epilog.

But under the conditions listed below, the compiler will not do software pipelin-ing [19]:

• If a register value lives too long, the code is not software-pipelined.

• If a loop has complex condition code within the body that requires more than five condition registers, the loop is not software pipelined.

• A software-pipelined loop cannot contain function calls, including code that calls the run-time support routines.

• In a sequence of nested loops, the innermost loop is the only one that can be software-pipelined.

• If a loop contains conditional break, it is not software-pipelined.

In our work, we must maximize the number of loops that satisfy the requirements of software pipelining. Software pipelining is a very important technique for optimization, its importance cannot be overemphasized.

4.3.4 Intrinsics [19]

The C6000 compiler provides intrinsics, which are special functions that map directly to inlined C64x instructions, to optimize C/C++ code quickly. All assembly instruc-tions that are not easily expressed in C/C++ code are supported as intrinsics. A table of TMS320C6000 C/C++ compiler intrinsics can be found in [19].

相關文件