• 沒有找到結果。

2.5.

2.5.

2.5. Summary

Figure 2-12 : The throughput of foreman.yuv with the proposed VLD

Figure 2-13 : The throughput of mobile.yuv with the proposed VLD

Figure 2-12 and Figure 2-13 show the throughput of two pictures with the proposed VLD. The simulation environment is JM 9.2 which C code of H.264/AVC system. We set nine different values of QP to get the simulation results. In the two figures, the blue line is the throughput requirement of baseline@3.1 specified in H.264/AVC standard when the clock frequency is 100MHz and the black one is for baseline@3.2. In Figure 2-12, the throughput of foreman meets the requirement of baseline@3.2 when QP is 20 and that of I-frame in the same picture also meets that standard when QP is 28. In Figure 2-13, the throughput of mobile meets the demand when QP is 28. Therefore, the proposed design can support H.264/AVC baseline.

[3] [4] Proposed Design

Tech. 0.25 um 0.18 um 0.18 um

Gate-count 6100 4720 CAVLC : 3267

MPEG2 : 945 Target Spec. Baseline Profile Main Profile @4.1

Main Profile @4.2

&

MPEG-2

Buffer N.A. 696 bits RAM 3471 gate-count

Clock Constraint 125 MHz 125 MHz 125 MHz

Table 2-3 : Hardware cost evaluation of proposed low power design

Table 2-3 shows the comparison of the hardware cost. Although we show the throughput of two pictures in Figure 2-12 and Figure 2-13 when the clock frequency is 100MHz, the maximum speed of the proposed design is 180MHz under a 0.18um CMOS technology. The performance is fast enough for meeting the real-time processing requirement of CAVLC decoding on main profile @4.2. Compared to the design proposed by [4], The CAVLC part of the proposed design reduce 30%

hardware cost, and the total design still has less hardware cost. The proposed design doesn’t use RAM as storage due to the power saving.

Spec. MPEG-2 I-frame H.264 I-frame H.264 P-frame

power (mW) 1.719 1.302 1.376

Table 2-4 : The post layout power consumption under 0.18um CMOS Tech.

Table 2-4 shows the post layout power consumption under 0.18um CMOS technology. The proposed design can provide extremely low power, and it is used in our dual-standard system [8], [9].

Chapter 3.

A VLC Codec System for dual standards

<Function>

Figure 3-1 : The architecture of our proposed system

Figure 3-1 shows the architecture of our proposed system for H.264/AVC main profile. The entropy decoder contains CABAD, UVLD, and CAVLD. UVLD and CAVLD are the same choice for entropy decoder, and UVLD is used to decode the syntax parser, and CAVLD is for residual data. Therefore, the output of UVLD is to control the decoding mode of H.264/AVC decoder, and the results of CAVLD are the DCT coefficients of residual data. After IDCT, the data will be added with the predicted data to complete a unit block.

In Figure 3-1, CABAD has to use slice memory to store the context model and row-storage. Figure 3-2 shows the usage of memory of CABAD in our proposed H.264/AVC decoder system. The context model of CABAD uses 349.1 bytes memory of the slice memory.

Figure 3-2 : The usage of memory of CABAD in our proposed H.264/AVC decoder

The context model of CABAD uses much memory, so that is an idea to integrate CABAD and CAVLD. The used memory can provide a space to store the VLC tables of CAVLD, and our proposed H.264/AVC decoder receive parallel input of bitstream, so we have to try another approach to implement CAVLD. Besides, as mentioned in my motivation, if we add the CAVLC encoder into the entropy decoder, that can be integrated with H.264/AVC encoder to a H.264/AVC codec system. Therefore, we try to find a method to implement a VLC codec system based on memory. and finally we proposed a new group-based VLC codec system reference to [6] and [7].

3.1. 3.1.

3.1. 3.1. The Architecture of the Proposed VLC Codec System

Here, we will describe the architecture of the proposed VLC codec system. We will focus on the design of CAVLC encoder/decoder, and not to express the MPEG-2 VLC codec in detail. That is because the major difference of the proposed MPEG-2

VLC codec is the group-based algorithm and hardware implementation, and other parts basically are similar to the conventional VLC codec design. Therefore, about the MPEG-2 VLC codec system, we only discuss the proposed group-based alteration, and we will pay attention to the CALVC encoder/decoder.

codeword boundary detector coefficients scanner

Figure 3-3 : Block diagram of the proposed VLC codec design

The block diagram of the proposed VLC codec design is shown in Figure 3-3. To fit specification of our proposed H.264/AVC decoder system, the input bitstream is parallel input and its length is 8 bits. The decoder is controlled by the enable signal, is_decoding, so is the encoder. The maxNum is to decide the block type which is being decoded or encoded, and nC is introduced in 2.1 to choose the correct VLC table for coeff_token. The serial input data, coefficients, is the DCT coefficient for the encoder in reverse order. The codeword boundary detector has a FIFO to store the input bitstream, and the output signal, FIFO_full, represents whether the bitstream FIFO is full or not. The symbols constructor will send out the results of DCT coefficients arranged and the bitstream concatenater handles the link of the encoded

codeword. The illumination of the components is as follows.

 The major functions of the codeword boundary detector are counting the leading ones and zeros, and fetching the demanded suffix for the each decoding function unit by the recorded bitstream boundary. Besides, it is also a controller to decide the activity of each decoding component, and it has to calculate the number of skipped run_before and then send the information to symbols constructor. For MPEG-2 VLC, it has to detect the special case such as escape mode and end of block.

 After coefficients scanner receive the serial input data, DCT coefficients, it calculates and sends the necessary data for each encoding component. When doing MPEG-2 VLC encoding, it only counts the levels and runs. After sending the MPEG-2 level and run, it can receive the following coefficients. The more information is needed for CAVLC encoding, and this unit should calculate TotalCoeff, TrailingOnes, T1s flags, levels, and run_befores. Different from MPEG-2 process, coefficients scanner has to receive all coefficients of one block, and then it can begin requesting the coefficients of the next encoding block.

 Group-based VLC codec system uses the proposed group-based VLC codec algorithm to implement MPEG-2 and CAVLC coeff_token encoder/decoder. Besides, it contains the NUM_FLC of CALVC coeff_token and MPEG-2 escape case. The detailed design contribution will be described in the following section.

 Trailing_ones_sign_flag encodes and decodes the signs of all trailing ones.

 Level codec with efficient coding handles the information about levels. The detail of efficient coding will be expressed in the next chapter.

 Total_zeros codec with efficient coding deals with the coding process of total_zeros.

 Run_before codec with efficient coding encodes and decodes the run_befores to get the wanted results.

 The symbols constructor is used for decoding process. It arranges the decoded levels by the decoded runs. In CAVLC decoding process, it works at the same time when decoding run_before to increase the decoding throughput.

 The bitstream concatenater collects the encoded bit streams and links them.

The first step it receives the codeword value and length to assemble the bitstream belonging to each encoding process. Then, it concatenates the separate bit streams to transmitted bitstream.

The decoding procedure of CAVLC decoder has to decode the bitstream step by step, because the bit streams have data dependency. If we don’t get some decoded information, we can’t do the next step. Therefore, the important thing to increase the decoding throughput is to reduce the decoding cycles for each component. The CAVLC decoding steps are as follows:

 Counting the leading zeros until detecting the first one of the input bitstream, and then sends the leading zeros and suffix to group-based VLC codec system. If nC is the value of NUM_FLC, we only send suffix.

 Decoding the coeff_token according to group-based VLC algorithm. The component outputs the suffix length to calculate the used bitstream boundary.

 After decoding the coeff_token, we will get TrailingOnes that can help us decide suffix length transmitted to Trailing_ones_sign_flag. When decoding Trailing_ones_sign_flag, we also count the leading zeros belong to level decoding process.

 At the same time to decode levels, we count the leading zeros of level decoding or total_zeros. When the number of decoded level is equal to TotalCoeff, we have to quit decoding level.

 When decoding total_zeros, we count the leading zeros used for some run_before symbol and the leading ones for zero skipping.

 When decoding run_before, we still count the leading zeros used for the next run_before symbol and the leading ones for zero skipping. Then, according the previous decoded run_before, we can begin arranging the DCT coefficients into the correct position in the decoded block. When the zerosLeft is equal to 0 or the last run_before is decoded, the run_before process has to end.

The encoding process of CAVLC encoder doesn’t have so many steps, although we can design the encoding process like the way of decoding procedure. However, we consider the throughput of the CAVLC encoder is worse, if we execute the encoding process with the serial steps. We observe that there is no data dependency between the encoded symbols for different encoding component, so we can do the encoding steps parallel. For example, even if coeff_token step doesn’t finish, we can still execute level encoding step, because the data for level encoding step doesn’t depend on the results of coeff_token encoding step. Therefore, when executing encoding process, all components of our proposed design will work together. The design idea is to increase the encoding throughput, because the throughput of the proposed CAVLC encoder design depends on the most cycles of encoding step instead of the sum of cycles cost by all encoding components.

In order to support the proposed encoder design, how to design a bitstream concatenater is important. The bitstream concatenater has to link the encoded codewords as fast as possible. We don’t hope we save the cycles of encoding process, but we take more efforts to concatenate the encoded codewords. Therefore, this design will be described in Chapter 4, and here we first introduce the proposed VLC group-based codec system.

3.2.

3.2.

3.2.

3.2. Conventional Group-based VLC Codec System

This work is previously developed and verified by Bai-Jue Hsieh in [6], [7]. The intention of this section is to quickly give us a sense of what a conventional group-based VLC Codec system is and how it works.

3.2.1.

3.2.1.

3.2.1.

3.2.1. Definition of Codeword Groups

An example of Huffman code and codeword grouping is illustrated in Figure 3-4.

Based on this result, the conventional codeword group is a set of codewords whose source symbols are combined to perform the Huffman procedure and receive the same codeword length. According to this definition, the codeword groups have the following properties:

 In a group, the codeword can be treated as a binary number which is codeword length-bit long, called VLC_codenum, since the codeword length is the same.

 The codeword that has the smallest VLC_codenum in a group is denoted VLC_mincode.

 A VLC_codeoffset is the offset value between the VLC_mincode and the VLC_codenum.

Figure 3-4 : Example of VLC table and codeword groups

In Figure 3-4, the symbols C4, C5, and C6 belong to the codeword group G3. In this group, the codewords have the same codeword length, 4-bit, and the prefix 112. The word length of the suffixes is 2-bit. Therefore, the 4-bit VLC_codenums are13, 14, and 15; the VLC_mincode is 4’b1101; and the 2-bit VLC_codeoffsets are 0, 1, and 2. Source symbols that are not combined will belong to different groups, such as C7, C8, and C9 in G0, and C4, C5, and C6 in G3, although codeword lengths are identical.

Moreover, there is only one symbol in group G1 since C1 is the only VLC having length of 2 bits.

3.2.2.

3.2.2.

3.2.2.

3.2.2. Intra-Group Decoding Procedure

Besides grouping codewords, mapping symbols onto memories and extraction codeword group information are necessary for VLC decoding. The memory address of a symbol in a group is calculated by the VLC_codeoffset of the symbol and the base address of the VLC_mincode in that group; i.e. the symbol address is the sum of the VLC_codeoffset and the base address of the group. After applying this arithmetic relationship, decoded symbol address can be found by numerical calculation rather than by pattern matching. Thus, the group information to be stored is composed of

codeword length, VLC_mincode, and base address. Based on the group information in Figure 3-5, intra-group decoding/encoding procedure is performed as follows.

Assuming we are decoding codeword 100112.

 VLC_codeoffset = VLC_codenum(100112) – VLC_mincode(100002) = 000112 =3;

 symbol_address = VLC_codeoffset(3) + base_address(50) = 53;

 the decoded symbol C4 is retrieved from memory address 53;

Assuming the encoded symbol address is 103.

 VLC_codeoffset = symbol_address (103) – base_addresss (100) = 3;

 VLC_codenum = VLC_codeoffset (3) + VLC_mincode (32) = 35;

 The encoded 8-bit codeword is 001000112 = 35.

Group Information : codeword length = 5 VLC_mincode = 100002 base address = 50

Figure 3-5 : Example of intra-group symbol memory mapping and group information

3.2.3.

3.2.3.

3.2.3.

3.2.3. Group-searching Scheme

An economical group-searching scheme with high operation rate and low complexity determines the performance of a group-based VLC decoder because the decoding procedure is performed after the group information is obtained. We use inter-group symbol memory mapping and Pseudo-Constant-Length-Code (PCLC) in order to achieve such a group-searching scheme. If all codeword lengths are the same , the numerical properties of codewords in a group can be applied to the whole coding table. We apply a procedure, namely PCLC procedure, to equalize codeword lengths by adding redundant binary digits, 00…0, behind VLC codewords. Therefore, PCLC codewords, which have the same length as the longest VLC codeword, can be treated as binary numbers, PCLC_codenums.

Table 3-1 : Example of inter-group symbol memory mapping

Group Valid codelength PCLC_mincode base address

G0 1 8 00100100 0

G1 1 6 00110000 4

G2 1 4 01100000 5

Table 3-2 : Group information for Table 3-1

It is easily to distinguish PCLC codewords and PCLC_codenums from each other because the VLC code is a prefix code. As a result, a PCLC table is established with PCLC_codenums placed in ascending order, i.e. codenum0 < codenum1 < … <

codenumn. This results in ascending PCLC_mincodes as well, i.e. mincode0 <

mincode1 < … < mincoden. Based on the PCLC table, the base addresses have to be assigned in PCLC_mincode order, i.e. base_addr0 < base_addr1 < … < base_addrn, for inter-group symbol memory mapping. An example of the PCLC table and its intra/inter-group symbol memory mapping is shown in Table 3-1, and the group information of this PCLC table is given in Table 3-2, where the valid bit indicates whether the group information is used. We can see in Table 3-1 that G2 is inserted in the middle of G1. This placement is specialized for decoding to save memory space of symbol memory.

According to PCLC tables and symbol memory maps, the conventional decoding group searching scheme is realized by applying numerical properties to bitstream and symbol addresses. Similar to PCLC codewords, a decoded bitstream that has the same length as the PCLC codewords is treated as a binary number, bitstream_num. Because the bitstream is a sequence of concatenated codewords, such as codewordi – codewordj – etc, a relation between the bitstream and the PCLC table can be expressed by PCLC_codenumi ≦ bitstream_num < numerical comparisons. The decoded codeword belongs to group Gx when the hit condition, PCLC_mincodex ≦ bitstream_num < PCLC_mincodex+1, is encountered. Let’s see the process of decoding one symbol from bitstream “001111010110…”

Figure 3-6 : Process of decoding a symbol

Figure 3-7 : Process of encoding a symbol address

According to the relation between PCLC tables and the symbol address, the conventional encoding group searching scheme is realized by applying numerical properties to codewords and symbol addresses. Based on the encoded symbol, the relative symbol address can be fetched. A relation between the symbol address and the PCLC table can be expressed by base_addri ≦ symbol address < numerical comparisons. The decoded codeword belongs to group Gy when the hit condition, base_addry ≦ bitstream_num < base_addry+1, is encountered. Let’s see the process

of encoding one symbol from the symbol address “19 (5’b10011)”

Figure 3-8 : Block diagram of conventional group-based VLC decoder architecture

The conventional VLC codec system is designed for MPEG applications with coding tables up to 256-entry 12-bit symbols and 16-bit codewords. This system performs concurrent encoding and decoding procedures by accessing the same group information and achieves table programmability by loading data into on-chip memories. To complete the VLC codec processes of MPEG videos, this design includes the operations of sign bits and escaped run-levels (escRL) following VLC codewords. By the efficient symbol conversion, the memory requirement is reduced to (25x8 + 28x8 + 28x12 + 32x29) bits for a CBS-LUT, a symbol address memory, a symbol memory, and 32-entry group-information. Block diagram of the conventional VLC codec system is shown in Figure 3-8. It mainly consists of the following components.

 The group-based VLC encoder/decoder is composed of group detectors and combinational logic circuits to realize the VLC codec processes.

 The input FIFO stores the input bitstream. According to previous decoded

results, the Dec_bitstream selector transmits codewords bitstream to the VLC decoder.

Besides, this selector detects sign bits and escRLs when VLC codewords are decoded.

 The Enc_bitstream concatenater adds sign bits or escRL’s behind VLC codewords and concatenates encoded results into a single bitstream. Then, every 32 bits of the encoded bitstream in the concatenater is shifted into the Output FIFO.

 The special code detector recognizes special codes, such as escape and EOB, by checking decoded symbol addresses instead of decoded symbols. Without waiting for symbol fetching, this detector can determine the length of the additional bits following a VLC codeword. Hence, the next codeword bitstream can be found by the Dec_bitstream selector immediately and the decoding throughput can be increased.

 The Enc_en and Dec_en Ctrls determine the operations of the VLC_encoder and decoder according to the condition of input data and FIFOs.

 Both symbol address and symbol memories are the on-chip memory modules for storing symbol information.

 The symbol converter performs symbol conversion and detects escaped RLP’s and EOB symbols. On the other hand, the symbol recoverer finds correct runs and signed levels based on decoded results.

3.3.

3.3.

3.3.

3.3. The Proposed Group-Based VLC Encoding and Decoding

3.3.1.

3.3.1.

3.3.1.

3.3.1. The Definition of Decoding Codeword Groups

group symbol prefix suffix suffix_num suffix

_offset attribute

Table 3-3 : An example of CAVLC code and codeword grouping

An example of CAVLC code and codeword grouping is illustrated in Table 3-3.

CAVLC code is also constructed based on Huffman code, and as we introduce it in Chapter 1, Huffman code is a prefix code, that is any codeword is not the prefix code of other codewords. For example, the symbol, S3, listed in Table 3-3 is relative to the

CAVLC code is also constructed based on Huffman code, and as we introduce it in Chapter 1, Huffman code is a prefix code, that is any codeword is not the prefix code of other codewords. For example, the symbol, S3, listed in Table 3-3 is relative to the

相關文件