Chapter 4 Probability-Based Static Scaling Optimization for Fixed Wordlength FFT
4.4 Scaling Optimization
4.5.3 SQNR for a real case study
In this experiment, we use a piece of 16-bit 11 KHz audio in WAV format from Wikipedia [80]. The PMF of the audio data is given in Figure 27, which is very close to a normal distribution with a mean of zero and a standard deviation of 0.168. Table 13 presents the number of integer bits of each stage and the scaling optimization outcomes for 256-point radix-2 FFT. Oppenheim’s method [49], which increases the integer part by one bit for every single stage, gives a moderate result (SQNR=46.07dB). Ramakrishnan’s method [50], which increases the integer part by
Figure 26 SQNR vs. standard deviation (radix-4, input in 12b1f).
Figure 27 The PMF of the piece of music.
0 5 10 15 20 25 30 35 40 45 50
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
SQNR (dB)
Standard deviation σ
Oppenheim [49]
BFP [1]
proposed Ramakrishnan [50]
one bit for every two stages, suffers a serious overflow problem under this particular input and the resultant output SQNR is thus extremely low (10.18dB).
Since the additional scale factor tables in BFP determine an appropriate number format by detecting the largest value for each block, the 16-bit wordlength is long enough to preserve the high accuracy and significantly diminish the chance of overflow. Thus, the BFP method [1] achieves the best SQNR result. Meanwhile, if a uniformly distributed input is assumed in our proposed method (not the correct distribution), the output SQNR can achieve 51.75dB, which is still much higher than that of the Oppenheim’s method [49]. Moreover, if the correct PMF of that piece of music (normal distribution with a mean of zero and a standard deviation of 0.168) is used in the analysis, the resultant output SQNR is even better (54.04dB). The experimental results here demonstrate that our proposed method can offer excellent quality of result if the PMF of the input is known in advance. Even if the PMF of the input is unknown, the quality of result is still fairly good in this experiment when compared with those two previous static scaling works.
Table 13 The Scaling optimization outcomes for 256-point radix-2 FFT
Method
Integer bits of each stage
SQNR
1 2 3 4 5 6 7 8
Oppenheim [49] 2 3 4 5 6 7 8 9 46.07dB
Ramakrishnan [50] 2 2 3 3 4 4 5 5 10.18dB
BFP [1] N/A 85.67dB
Proposed
(Uniform distribution) 2 3 4 5 6 6 7 7 51.75dB Proposed
(Normal distribution) 2 3 4 5 5 6 6 7 54.04dB
4.6 Summary
In this dissertation, we present an efficient static scaling technique for output SQNR optimization targeting FFT processor designs with the fixed wordlength constraint. Unlike most of conventional methods, it relies on fast probability-based analysis instead of time-consuming simulation to precisely model the noises induced from quantization and saturation. It works with various FFT sizes, radices, wordlengths, and most commonly presumed input distributions. The experimental results clearly show that the proposed technique is superior to existing static scaling approaches in all circumstances [49, 50]. Specifically, our technique can generate an 8192-point radix-2 FFT implementation with three bits shorter in wordlength than Oppenheim’s method [49] while still achieving the same output SQNR, which implies a significant improvement in area, latency, and power consumption. Besides, it generally performs as nearly well as the BFP-based dynamic scaling technique [1], which requires an extra hardware unit. Therefore, we believe our fast static probability-based scaling optimization technique is very practical and helpful for creating area-efficient fixed-wordlength (e.g., memory-based) FFT processor designs
Chapter 5
Conclusion and Future Works
5.1 Conclusion
Since the DSP algorithms take an important role in the communication systems, the hardware implementations must carefully consider many parameters, such as the bitwidth, the base architecture, and the number scaling. In order to deal with the design issues at crafting the DSP algorithms, we propose three techniques in this dissertation: 1) a bitwidth-aware synthesis algorithm for MCM designs, 2) an EMDC-based FFT architecture, and 3) a static scaling technique for pipelined FFT processor designs.
First, we present an ILP-based bitwidth-aware area minimization algorithm for MCM designs, which points out that the total adder bit count rather than the total adder count can better estimate the hardware cost in a real implementation. Then, for a given MCM design, those target constants are first represented in a specified number format. Next, a subexpression graph is created to record all feasible decompositions for every target constant. The graph also keeps track of the required adder bitwidth as well as two subexpressions for every decomposition. At last, the area minimization problem is formulated as a set of ILP constraints derived from the subexpression graph and optimally resolved within an acceptable runtime.
Then, we propose an expandable multi-path delay commutator (EMDC) based FFT architecture. We show that the proposed architecture can be easily and flexibly expanded to satisfy throughput-hungry applications. In addition, a parameterizable
hardware generator is also developed to automatically produce the specified HDL code so that the design cost and time can be drastically minimized. Finally, the theoretical analyses and/or experimental results demonstrate that the proposed architecture does consume less area and power than the existing foldable Pease architecture under the same throughput constraint.
Finally, we present an efficient static scaling technique for output SQNR optimization targeting FFT processor designs with the fixed wordlength constraint.
Without using the time-consuming simulation to precisely model the induced quantization and saturation noises, we proposed a probability-based static scaling analysis to model the probabilistic behavior of the output signal at each stage. It works with various FFT sizes, radices, wordlengths, and most commonly presumed input distributions. The experimental results clearly show that the proposed technique is superior to existing static scaling approaches in all circumstances [49, 50].
Specifically, our technique can generate an 8K-point radix-2 memory-based FFT processor without compromise in the output SQNR. Besides, it generally performs as nearly well as the BFP-based dynamic scaling technique [1], which requires an additional hardware unit.
5.2 Future Works
Although the problem of multiplier-less constant multiplication has been studied for many decades, it still attracts a large number of attentions. In general, the digit-based algorithms can provide a better solution than the graph-based algorithms, but the quality of result of the digit-based algorithms is highly depended on the decomposition method. For example two numbers, 3 and 29, can be decomposed as follows in MSD form:
MSD graph-based algorithm, RAG-n [14], allows the usage of right shifters and accepts a mapping result that induces extra adders for a coefficient to maximize the global subexpression sharing, it is capable of finding a design solution that is not presented in the digit-based algorithms. Figure 28 shows an alternative design that only two adders are needed. Obviously, the solution is outside of the current digit-based decomposition. Thus, the exploration of finding a valid and a limited number of decomposition is a research topic.
Furthermore, in order to achieve the high-data-rate communication, many new algorithms are proposed. For example, polor codes [81, 82] derived from the concept of channel polarization have emerged as the important channel codes for the capacity-achieving property. The hardware architecture of a (8, 4) polar code is given in Figure 29. Similar to the FFT core design, the same techniques on the hardware folding and the scaling analysis can be applied to the polar decoder. However, the data
Figure 28 An alternative implementation for the number 3 and 29.
+
propagation is always performed in logarithmic domain which is not a linter time-invariant (LTI) operation. The non-linear property needs further research to apply the proposed techniques.
Figure 29 The factor graph of (8,4) polar code.
=
=
=
=
=
=
=
=
=
=
=
= Stage 1 Stage 2 Stage 3
References
[1] Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, “A dynamic scaling FFT processor for DVB-T applications,” IEEE Journal of Solid-State Circuits, vol. 39, no. 11, pp. 2005–2013, Nov. 2004.
[2] C.-T. Lin, Y.-C. Yu, and L.-D. Van, “A low-power 64-point FFT/IFFT design for IEEE 802.11a WLAN application,” IEEE International Symposium on Circuits and Systems, 2006 (ISCAS 2006), pp. 4523–4526, May 2006.
[3] S. Li, H. Xu, W. Fan, Y. Chen, and X. Zeng, “A 128/256-point pipeline FFT/IFFT processor for MIMO OFDM system IEEE 802.16e,” IEEE International Symposium on Circuits and Systems, 2010 (ISCAS 2010), pp. 1488–1491, May 2010.
[4] Y. Chen, Y.-W. Lin, Y.-C. Tsao, and C.-Y. Lee, “A 2.4-Gsample/s DVFS FFT processor for MIMO OFDM communication systems,” IEEE Journal of Solid-State Circuits, vol. 43, no. 5, pp. 1260–1273, May 2008.
[5] C.-M. Chen, C.-C. Hung, and Y.-H. Huang, “An energy-efficient partial FFT processor for the OFDMA communication system,” IEEE Transactions on Circuits and Systems II, vol. 57, no. 2, pp. 136–140, Feb. 2010.
[6] S.-N. Tang, C.-H. Liao, and T.-Y. Chang, “An area- and energy-efficient multimode FFT processor for WPAN/WLAN/WMAN systems,” IEEE Journal of Solid-State Circuits, vol. 47, no. 6, pp. 1419–1435, Jun. 2012.
[7] Y.-W. Lin and C.-Y. Lee, “Design of an FFT/IFFT processor for MIMO OFDM systems,” IEEE Transactions on Circuits and Systems I, vol. 54, no. 4, pp. 807–815, Apr. 2007.
[8] M.-L. Ku and C.-C. Huang, “A complementary codes pilot-based transmitter diversity technique for OFDM systems,” IEEE Transactions on Wireless Communications, vol. 5, no. 3, pp. 504-504, Mar. 2006.
[9] E. Grass, K. Tittelbach-Helmrich, U. Jagdhold, A. Troya, G. Lippert, O.
Kruger, J. Lehmann, K. Maharatna, K.F. Dombrowski, N. Fiebig, R. Kraemer, and P. Mahonen, “On the single-chip implementation of a Hiperlan/2 and IEEE 802.11a capable modem,” IEEE Personal Communications, vol. 8, no. 6, pp. 48–57, Dec. 2001.
[10] M. Krstić, K. Maharatna, A. Troya, E. Grass, and U. Jagdhold, “Baseband processor for IEEE 802.11a standard with embedded BIST,” Facta Universitatis, Series: Electronics and Energetics, pp. 231-239, Feb. 2007.
[11] P. Cappello and K. Steiglitz, “Some complexity issues in digital signal processing,” IEEE Transactions on Acoustics, Speech, and Signal Processing,
vol. 32, no. 5, pp. 1037-1041, Oct. 1984.
[12] N. Sidahao, G.A. Constantinides, and P.Y. Cheung, “A heuristic approach for multiple restricted multiplication,” IEEE International Symposium on Circuits and Systems, 2005 (ISCAS 2005), pp. 692-695, May 2005.
[13] D. Bull and D. Horrocks, “Primitive operator digital filters,” IEE Proceedings G on Circuits, Devices and Systems, vol. 138, no. 3, pp. 401-412, Jun. 1991.
[14] A.G. Dempster and M.D. Macleod, “Use of minimum-adder multiplier blocks in FIR digital filters,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 42, no. 9, pp. 569-577, Sep. 1995.
[15] M. Mehendale, S.D. Sherlekar, and G. Venkatesh, “Synthesis of multiplier-less FIR filters with minimum number of additions,” IEEE/ACM International Conference on Computer-Aided Design, 1995 (ICCAD 1995), pp. 668-671, Nov. 1995.
[16] H.-J. Kang, H. Kim, and I.-C. Park, “FIR filter synthesis algorithms for minimizing the delay and the number of adders,” IEEE/ACM International Conference on Computer Aided Design, 2000 (ICCAD 2000), pp. 51-54, Nov.
2000.
[17] I.-C. Park and H.-J. Kang, “Digital filter synthesis based on minimal signed digit representation,” Proceedings on Design Automation Conference, 2001 (DAC 2001), pp. 468-473, Jun. 2001.
[18] M. Kumm, P. Zipf, M. Faust, and C.-H. Chang, “Pipelined adder graph optimization for high speed multiple constant multiplication,” IEEE International Symposium on Circuits and Systems, 2012 (ISCAS 2012), pp. 49
-52, May 2012.
[19] Y. Voronenko and M. Püschel, “Multiplierless multiple constant multiplication,” ACM Transactions on Algorithms (TALG), vol. 3, no. 2, pp. 1549-6325, May 2007.
[20] R.I. Hartley, “Subexpression sharing in filters using canonic signed digit multipliers,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 43, no. 10, pp. 677-688, Oct. 1996.
[21] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D. Durackova, “A new algorithm for elimination of common subexpressions,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 1, pp. 58-68, Jan. 1999.
[22] R. Hewlitt and E. Swartzlantler, “Canonical signed digit representation for FIR digital filters,” IEEE Workshop on Signal Processing Systems, 2000 (SiPS 2000), pp. 416-426, 2000.
[23] O. Gustafsson and L. Wanhammar, “ILP modelling of the common
subexpression sharing problem,” International Conference on Electronics, Circuits and Systems, 2002 (ICECS 2002), vol. 3, pp. 1171-1174, 2002.
[24] C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. Chien, and C.-T. Hsu, “A novel common-subexpression-elimination method for synthesizing fixed-point FIR filters,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 11, pp. 2215-2221, Nov. 2004.
[25] Y. Wang and K. Roy, “CSDC: a new complexity reduction technique for multiplierless implementation of digital FIR filters,” IEEE Transactions on Circuits System I: Regular Papers, vol. 52, no. 9, pp. 1845-1853, Sep. 2005.
[26] O. Gustafsson, “Lower Bounds for Constant Multiplication Problems,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 54, no. 11, pp. 974-978, Nov. 2007.
[27] L. Aksoy, E. da Costa, P. Flores, and J. Monteiro, “Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 6, pp. 1013-1026, Jun. 2008.
[28] Y.A. Ho, C. Lei, H. Kwan, and N. Wong, “Global optimization of common subexpressions for multiplierless synthesis of multiple constant multiplications,” IEEE Asia and South Pacific Design Automation Conference, 2008 (ASPDAC 2008), pp. 119-124, Mar. 2008.
[29] Y.A. Ho, C. Lei, H. Kwan, and N. Wong, “Optimal common sub-expression elimination algorithm of multiple constant multiplications with a logic depth constraint,” IEICE Transactions on Fundamentals of Electronics, Communications, and Computer Science, vol. E91-A, no. 12, pp. 3568-3575, Dec. 2008.
[30] L. Aksoy, E.O. Gunes, and P. Flores, “An exact breadth-first search algorithm for the multiple constant multiplications problem,” NORCHIP conference, 2008, pp. 41-46, Nov. 2008.
[31] J. Thong and N. Nicolici, “Combined optimal and heuristic approaches for multiple constant multiplication,” IEEE International Conference on Computer Design, 2010 (ICCD 2010), pp. 266-273, Oct. 2010.
[32] R. Mahesh and A. Vinod, “New reconfigurable architectures for implementing FIR filters with low complexity,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 2, pp. 275-288, Feb.
2010.
[33] J. Thong and N. Nicolici, “An Optimal and Practical Approach to Single Constant Multiplication,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 9, pp. 1373-1386, Sep. 2011.
[34] M. R. Garey and D. S. Johnson, Computers and intractability: a guide to the theory of NP-completeness, W. H. Freeman & Co., 1979.
[35] P. J. Downey, R. Sethi, and R. E. Tarjan, “Variations on the common subexpression problem,” Journal of the ACM (JACM), vol. 27, no. 4, pp. 758
-771, Oct. 1980.
[36] J. W. Cooley and J. W. Turkey, “An algorithm for machine computation of complex Fourier series,” Mathematics of Computation, vol. 19, pp. 297–301, 1965.
[37] L. R. Rabiner and B. Gold, Theory and application of digital signal processing, Prentice-Hall, Inc., 1975.
[38] E. H. Wold and A. M. Despain, “Pipeline and parallel-pipeline FFT processors for VLSI implementations,” IEEE Transactions on Computers, vol. C-33, no.5, pp. 414–426, May 1984.
[39] A. M. Despain. “Fourier transform computers using CORDIC iterations,”
IEEE Transactions on Computers, vol. C-23, no. 10, pp. 993–1001, Oct. 1974.
[40] S. He and M. Torkelson, “A new approach to pipeline FFT processor,” The 10th International Parallel Processing Symposium, 1996 (IPPS 1996), pp. 766–770, Apr. 1996.
[41] R. Storn. “Radix-2 FFT-pipeline architecture with reduced noise-to-signal ratio,” IEE Proceedings of Vision, Image, and Signal Processing, vol. 141, no. 2, pp. 81–86, Apr. 1994.
[42] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation,” International Symposium on Signals, Systems, and Electronics, 1998 (ISSSE 1998) , pp. 257–262, Sep. 1998.
[43] C. Cheng and K. K. Parhi, “High-throughput VLSI architeture for FFT computation,” IEEE transactions on Circuits and Systems II, vol. 54, no. 10, pp. 863–867, Oct. 2007.
[44] A. Cortes, I. Velez, and J. Sevillano, “Radix rk FFTs: matricial representation and SDC/SDF pipeline implementation,” IEEE transactions on Signal Processing, vol. 57, no. 7, pp. 2824–2839, Jul. 2009.
[45] S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, “A 2.4-GS/s FFT processor for OFDM-based WPAN applications,” IEEE Transactions on Circuits and Systems II, vol. 57, no. 6, pp. 451–455, Jun. 2010.
[46] Y. Chen, Y.-C. Tsao, Y.-W. Lin, C.-H. Lin, and C.-Y. Lee, “An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications,” IEEE Transactions on Circuits and Systems II, vol. 55, no. 2, pp. 146–150, Feb.
2008.
[47] W.-H. Chang and T.Q. Nguyen, “On the fixed-point accuracy analysis of FFT
algorithms,” IEEE Transactions on Signal Processing, vol. 56, no. 10, pp. 4673–4682, Oct. 2008.
[48] C.-Y. Wang, C.-B. Kuo, and J.-Y. Jou, “Hybrid wordlength optimization methods of pipelined FFT processors,” IEEE Transactions on Computers, vol. 56, no. 8, pp. 1105–1118, Aug. 2007.
[49] A.V. Oppenheim and C.J. Weinstein, “Effects of finite register length in digital filtering and the fast Fourier transform,” Proceedings of the IEEE, vol. 60, no. 8, pp. 957–976, Aug. 1972.
[50] S. Ramakrishnan, J. Balakrishnan, and K. Ramasubramanian, “Exploiting signal and noise statistics for fixed point FFT design optimization in OFDM systems,” 2010 National Conference on Communications (NCC), pp. 1–5, Jan.
2010.
[51] E. Ya. Remez, General computational methods of Chebyshev approximation:
The problems with linear real parameters, Washington, DC: Atomic Energy Commission Translation Series, 1957.
[52] T. Parks and J. McClellan, “Chebyshev Approximation for Nonrecursive Digital Filters with Linear Phase,” IEEE Transactions on Circuit Theory, vol. 19, no. 2, pp. 189-194, Mar. 1972.
[53] Gurobi Optimizer. [Online]. Available: http://www.gurobi.com
[54] G. Nordin, P. A. Milder, J. C. Hoe, and M. Püschel, “Automatic generation of customized discrete Fourier transform IPs,” ACM/IEEE Design Automation Conference, 2005 (DAC 2005) , pp. 471–474, Jun. 2005.
[55] P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, “Formal datapath representation and manipulation for implementing DSP transforms,”
ACM/IEEE Design Automation Conference, 2008 (DAC 2008), pp. 385–390, Jun. 2008.
[56] E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, “A fast single-chip implementation of 8192 complex point FFT,” IEEE Journal of Solid-State Circuits, vol. 30, no. 3, pp. 300–305, Mar. 1995.
[57] T. Lenart and V. Owall, “A 2048 complex point FFT processor using a novel data scaling approach,” IEEE International Symposium on Circuits and Systems, 2003 (ISCAS 2003), pp. IV-45–IV-48, May 2003.
[58] A. Mallik, D. Sinha, P. Banerjee, and H. Zhou, “Low-power optimization by smart bit-width allocation in a SystemC-based ASIC design environment,”
IEEE transactions on computer-aided design of integrated circuits and systems, vol. 26, no. 3, pp. 447–455, Mar. 2007.
[59] M. Haldar, A. Nayak, N. Shenoy, A. Choudhary, and P. Banerjee, “FPGA hardware synthesis from MATLAB,” International Conference on VLSI
Design, 2001, pp. 299–304, Jan. 2001.
[60] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “The multiple wordlength paradigm,” The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2001, pp. 51–60, Mar. 2001.
[61] O. Sarbishei and K. Radecka, “Analysis of mean-square-error (MSE) for fixed-point FFT units,” IEEE International Symposium on Circuits and Systems, 2011 (ISCAS 2011), pp. 1732–1735, May 2011.
[62] W.-H. Chang and T. Q. Nguyen, “On the fixed-point accuracy analysis of FFT algorithms,” IEEE Transactions on Signal Processing, vol. 56, no. 10, pp. 4673–4682, Oct. 2008.
[63] G. Caffarena, C. Carreras, J. A. López, and Á. Fernández, “SQNR estimation of fixed-point DSP algorithms,” EURASIP Journal on Advances in Signal Processing, vol. 2010, no. 21, pp. 1–12, Feb. 2010.
[64] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “Wordlength optimization for linear digital signal processing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 10, pp. 1432–1442, Oct.
2003.
[65] R. Rocher, D. Menard, O. Sentieys, and P. Scalart, “Analytical accuracy evaluation of fixed-point systems,” 15th European Signal Processing Conference, 2007 (EUSIPCO07), pp. 999–1003, Sep. 2007.
[66] D. Menard, R. Serizel, R. Rocher, and O. Sentieys, “Accuracy constraint determination in fixed-point system design,” EURASIP Journal on Embedded Systems, vol. 2008, no. 1, pp. 1–12, Oct. 2008.
[67] D. Menard, R. Rocher, and O. Sentieys, “Analytical Fixed-Point Accuracy Evaluation in Linear Time-Invariant Systems,” IEEE Transactions on Circuits and Systems I, vol. 55, no. 10, pp. 3197–3208, Nov. 2008.
[68] O. Sarbishei and K. Radecka, “On the fixed-point accuracy analysis and optimization of FFT units with DORDIC multipliers,” IEEE Symposium on Computer Arithmetic, 2011 (ARITH), pp. 62–69, Jul. 2011.
[69] H. Keding, M. Willems, M. Coors, and H. Meyr, “FRIDGE: a fixed-point design and simulation environment,” Proceedings of Design, Automation and Test in Europe, 1998, pp. 429–435, Feb. 1998.
[70] D.-U. Lee, A. A. Gaffar, R. C. C. Cheung, O. Mencer, W. Luk, and G. A.
Constantinides, “Accuracy-guaranteed bit-width optimization,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 10, pp. 1990–2000, Oct. 2006.
[71] J.W. Cooley and J.W. Tukey, “An algorithm for machine computation of complex Fourier series,” Math. Computation, vol. 19, pp. 45–48, 1965.
[72] W.-C. Yeh and C.-W. Jen, “High-speed and low-power split-radix FFT,” IEEE Transactions on Signal Processing, vol. 51, no. 3, pp. 864–874, Mar. 2003.
[73] Y.-W. Lin, H.-Y. Liu, and C.-Y. Lee, “A 1-GS/s FFT/IFFT processor for UWB applications,” IEEE Journal of Solid-State Circuits, vol. 40, no. 8, pp. 1726–
1735, Aug. 2005.
[74] R.C. Agarwal and J.W. Cooley, “Vectorized mixed radix discrete Fourier transform algorithms,” Proceedings of the IEEE, vol. 75, no. 9, pp. 1283–1292, Sep. 1987.
[75] A.V. Oppenheim and R.W. Schafer, Discrete-time signal processing, Prentice Hall, Englewood Cliffs, NJ, 2009.
[76] D.P. Bertsekas and J.N. Tsitsiklis, Introduction to probability, Athena Scientific, Nashua, NH, 2002.
[77] C. M. Grinstead and J. L. Snell, Introduction to probability, American Mathematical Society, Providence, RI, 1997.
[78] T.-C. Wei, W.-C. Liu, and S.-J. Jou, “A jointed mode detection and symbol detection scheme for DVB-T,” IEEE Transactions on Consumer Electronics, vol. 54, no. 2, pp. 336–341, May 2008.
[79] L. Yun, J. Zhou, and Y. Onozato, “Performance study for STC-OFDM systems based on IEEE802.11a standard,” International Conference on Wireless Communications, Networking, and Mobile Computing, 2009 (WiCom ‘09), pp. 1–4, Sep. 2009.
[80] Wikipedia: Waveform Audio File Format. [Online] Available:
http://en.wikipedia.org/wiki/WAV.
[81] B. Y. and K. K. Parhi, “Architecture optimizations for BP polar decoders,”
IEEE International Conference on Acoustics, Speech and Signal Processing, 2013 (ICASSP), pp. 2654–2658, May 2013.
[82] B. Yuan and K. K. Parhi, “Low-latency successive-cancellation polar decoder architectures using 2-bit decoding,” IEEE Transactions on Circuits and Systems I, vol. 61, no. 4, pp. 1241–1254, Apr. 2014.