Write linear assembly code - Developing A DSP Program

Chapter 4 Developing A DSP Program

4.3 Write linear assembly code

If some function’s performance still does not achieve the requirement by refining C code using above methods, we can write assembly code by ourselves.TMS320C6x provides linear assembly language to user. It is no need to assign which register to use in one instruction comparing to original assembly language. Because it isn’t assigning register, parallel executing the instruction is also can not done by user appoint but by linear assembly optimizer.

A linear assembly file has a extended filename *.sa and it will be noticed that several points like:

1. Program label will start at the first character in one line.

2. Instruction can not start at the first character, it must follow the space.

3. There are some mnemonic which are machine – instruction or optimizer directive can help your program.

Table 07 some common linear assembler’s Directive

Directive Description Restrictions

.call Calls a function Valid only within procedures .cproc Start a C/C++ callable

procedure Must use with .endproc .endproc End a C/C++ callable

procedure Must use with .cproc

.mptr Avoid memory bank conflicts

Valid only within procedures;

can use variables in the register parameter

.reg Declare variables Valid only within procedures .reserve Reserve register use Valid only within procedures

.return Return value to procedure Valid only within .cproc procedures

.trip Specify trip count value Valid only within procedures

Here is a linear assembly code function which can be called by C function

Figure 13 A example to show a C callable function writing by linear assembly

Chapter 5 DSP Board Implementation Result and Discussion

5.1 Choose M = 4 in SLM structure

802.11n is set up for wireless communication, it adopted MIMO OFDM structure, and choose FFT length 64. Because 64 is not too long so as we choose M = 4 in SLM structure is enough to deal with its PAPR problem. We can see fig 12 to know PAPR in the original OFDM structure will excess 9dB at the probability about 0.05, but applying to SLM with M=4, the probability is decrease to about 0.0001.

Figure 14 CCDF of PAPR in 64-pt FFT length SLM with different M

5.2 The use of the conversion matrix

Because choosing M = 4, we need to select additional three independent phase rotate vector. See chapter 1.2, using the conversion matrix can greatly lower the complexity of the original SLM structure. We select three phase rotate vector of the form [1, j , 1, j ] , [1, j , 1, -j ] and [1, j , -1, j ]. Figure 13 shows this conversion has no performance decade comparing to the IFFT banks. Table 8 shows the comparison of computation complexity between IFFT bank and conversion matrix.

5 6 7 8 9 10 11 12 13

10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

probability of the PAPR exceeds the threshold for SLM with 64-pt FFT , L=4,

threshold

P(PAPR > threshold) dB

original

SLM with M = 4(using IFFT bank) SLM with M = 4(using conversion matrix)

Figure 15 The CCDF of PAPR comparison between IFFT bank and conversion matrix

Table 08 computation complexity between IFFT band and conversion matrix

5.3 About insert the side information

We still don’t consider side information up to now, and side information insertion usually means that bit rate will decrease. It is the expense which we choose SLM to lower the PAPR.

We should reserve several tones to transmit side information to the receiver side.

It will need 2 (log₂M) bit to indicate which phase rotate vector is been used when M=4.

In order to prevent the side information being corrupt by noise easily so as to make whole symbol is wrong, we need to apply channel coding to the side information bit and use low order modulation on the side information tones.

Fig14 simulate the PAPR of the 64-pt FFT length OFDM structure, SLM structure that information tones are reserved but not transmitting side information, and SLM structure with transmitting side information at the different power level on the information tones. The modulation type is BPSK to resist the noise. The side information is pass channel coding with (5, 2) shorten hamming code which can detect

two errors and correct one error.

Figure 16 SLM structure with transmitting the side information

5.3 Fixed point format on the DSP board

There are usually several restrict on the modulation accuracy in most spec, we should choose the fixed point format to fit this requirement. Due to not find the modulation accuracy requirement in 802.11n spec, we reference to 802.16 spec’s error vector magnitude (EVM.) It is obvious that 8-bit is not achieving requirement, so we choose 16-bit fixed point format to represent fraction number.

Table 09 simulation EVM result At the output of QAM

mapping

At the input of IFFT At the output of IFFT EVM(﹪) when using

8 bit to quantize 6.6 6.6 6.98

EVM(﹪) when using

16 bit to quantize 1.462* 10^-3 1.462* 10^-3 5.854 * 10^-2

5.3 Code performance on the DSP board

Table8 shows the comparison of PC and DSP bit rate and execute time. Fig14 shows the percentage of the block execute time in the simulation.

Table 10 execute time and bit rate

Bit rate(bit/s) 89435.43 105346.27 310077.59

execution time

Figure 17 the percentage of the all block execution time

5.4 About using the digital IO

Quixote provides 40 bits of bidirectional digital I/O. The digital I/O port allows the baseboard to exchange digital handshaking and information signals with other hardware, control and signal other devices, and may be used for software troubleshooting tasks as well. The user DIO (UD) port that has separate control and data registers that allows byte-wide control of the direction. The UD digital IO port is on connector JP5 (MDR50 connector). See the appendix for the connector pinouts.

1. About setting: make sure to include right DSP/BIOS configuration and the DIO is contained. ( Choosing the Quixtoe in the board category page or copy the existed project setting directly ).

2. Make sure to include UserData.h header.

The digital IO class

Example:

5.5 The estimate of speeding up the program using FPGA

Speeding up the program, it is needed that analyzing your application to identify the operations that are high speed (above 1 MHz) and lower speed. Higher speed signal processing operations should be targeted at the FPGA provided that they are of manageable complexity. Typical FPGA operations include FIR filters, down conversion, specialized high speed triggering and data sampling, and FFTs. All of these functions are deterministic mathematical functions that are suitable for the FPGA. Data formatting, protocols and control functions are typically more easily implemented on the DSP.

The Quixote has two FPGAs: a Xilnx Spartan2 (200K gates) and a Xilinx Virtex2 (2M or 6M gates). The Virtex2 is used for the analog interfacing, and as the computational logic on the board. Major elements inside the Spartan2 logic are the PCI interface, interrupt control, timebase selection and message mailboxes.

Quixote also provide some available clock. When we like developing the custom logic, we will need using the refclk which has about 20MHz frequency.

Block FFT:

The number of the stage is log₂(N) in the N-point FFT structure. It can reach the maximum data rate is 20M * N / log2(N) if FFT stages can be done in the 20MHz clock period. Put the result in the 802.11n system (using 64-pt FFT), the maximum data rate is 20M * 64 / log2(64) = 213.33Mbit/s.

Block convolution encoder

There are six registers in the spec. 802.11n convolution encoder structure. Like the case in the FFT block, the maximum data rate depends on if we can put all logic into the 20MHz clock. We can reach the data rate about 20MHz when we really put into the 20MHz clock.

Table 11 speeding-up estimate using FPGA The estimated maximum

data rate by FPGA

data rate by DSP

IFFT 213.33Mbits/s 3.508Mbits/s

Convolution encoder 40Mbits/s 0.55026Mbits/s

the estimate data rate using FPGA

0 50 100 150 200 250

FFT encoder

function block

data rate (Mbits/s)

DSP FPGA

Figure 18 estimated data rate using FPGA

Chapter 6 Conclusion

SLM using conversion matrix is greatly lower the complexity comparing with the original IFFT bank in the original OFDM structure. Because MIMO-OFDM uses more transmit antenna, the complexity is needed lower in the every transmit antenna hardware dealing with the PAPR. In addition, 802.11n spec use 64 point FFT length, it is not too long as to very serious PAPR problem. So, we use SLM structure with conversion matrix but PTS structure in the 802.11n structure to reduce the PAPR.

SLM has not bad performance but greatly lower complexity comparing with other PAPR reduction method. And insert the side information by low complexity way, we also add a power level parameter to trade off between the PAPR reduction capability and power on side information. This thesis also conclude something about how to use DSP board to develop a efficient program.

Chapter 7 Reference

[1] H. Ochiai and H. Imai, “Performance Analysis of Deliberately Clipped OFDM Signals,” IEEE Trans. on Commun., vol. 50, pp. 89–101, Jan. 2002.

[2] H. Saeedi, M. Sharif, and F. Marvasti, “Clipping noise cancellation in OFDM systems using oversampled signal reconstruction,” IEEE Comm. Lett., vol. 6, pp.

73–75, Feb. 2002.

[3] A. E. Jones, T. A. Wilkinson, and S. K. Barton, “Block coding scheme for reduction of peak to mean envelope power ration of multicarrier transmission schemes, ” Electron. Lett., vol. 30, no. 25, pp. 2098 – 2099, Dec. 1994.

J. A. Davis and J. Jedwab, “Peak-to-mean power control in OFDM, Golay complementary sequences, and Reed- Muller codes,” IEEE Trans. Inform. Theory,

vol. 45, no. 7, pp. 2397–2417, Nov. 1999.

[4] K. Yang and S. Chang, “Peak-to-Average Power Control in OFDM Using Standard Arrays of Linear Block Codes,” IEEE Commun. Lett., vol. 7, no. 4, pp. 174–176, Apr. 2003.

K. Patterson, “Generalized Reed-Muller codes and power control in OFDM modulation,” IEEE Trans. Inform. Theory, vol. 46, no. 1, pp. 104–120, Jan. 2000.

[5] D. Wulich and L. Goldfeld, “Reduction of peak factor in orthogonal multicarrier modulation by amplitude limiting and coding,” IEEE Trans. on Commun., vol. 47,

no. 1, pp. 18–21, Jan. 1999.

[6] S. H. Müller and J. B. Huber, “OFDM with Reduced Peak–to–Average Power

Ratio by Optimum Combination of Partial Transmit Sequences,” Elect. Lett., vol.

33, no. 5, Feb. 1997, pp. 368–69.

[7] A. D. S. Jayalath and C. Tellambura, “Adaptive PTS Approach for Reduction of Peak-to-Average Power Ratio of OFDM Signal,” Elect. Lett., vol. 36, no. 14,

July 2000, pp. 1226–28.

[8] R. W. Ba¨uml, R. F. H. Fischer and J. B. H¨uber, “Reducing the peak-toaverage power ratio of multicarrier modulation by selective mapping, ” Electron. Lett., vol. 32, no. 22, pp. 2056 –2 057, Oct. 1996.

[9] M. Breiling, S. H. M¨ uller – Weinfurtner and Johannes B. Huber, “SLM peak-power reduction without explicit side information, ” IEEE Commun. Lett.,

vol. 5, no. 6, JUNE 2001.

[10] EunJung CHANG, HoYeol KWON, and John M. CIOFFI, “PAR Reduction of Multicarrier Signals Using Injected Tone Constellation”, IEICE TRANS.

COMMUN., VOL.E89–B, NO.10 OCTOBER 2006

[11] Krongold, B.S., Jones, D.L. An active-set approach for OFDM PAR reduction ,”

via tone reservation”, IEEE Commun. Mag., vol. 28, pp. 5-14, May 1990.

[12] Naoto Ohkubo † and Tomoaki Ohtsuki, “A Peak to Average Power Ratio Reduction of Multicarrier CDMA Using Selected Mapping”Vehicular

Technology Conference, 2002. Proceedings. VTC 2002-Fall. 2002 IEEE 56th

[13] Chin-Liang Wang, Ming-Yen Hsu, Yuan Ouyang, “A low-complexity peak-to-average power ratio reduction technique for OFDM systems”, Global Telecommunications Conference, 2003. GLOBECOM '03. IEEE

”[14]Innovative Quixote user manual

在文檔中多輸入輸出正交分頻多工系統中峰均功率比的減低 (頁 37-0)