• 沒有找到結果。

= K

k k

T P

P

1

~

~ , and the l (=3) patterns with

smallest estimated P~T among the s will be the estimated good enough subcarrier assignment patterns determined in this stage.

Remark 3.7: In [24], we have simulated that using a more accurate surrogate model to evaluate s (=50) candidate solutions, the top l (=3) solutions will contain the actual best among the s with probability 0.99.

3.2.4 Stage 4: Determine the Good Enough Subcarrier Assignment and Bit Allocation

Since there are only l (=3) subcarrier assignment patterns left, we can use the exact model, i.e. greedy algorithm, to calculate the optimal consumed power P for each pattern with very T limited computation time. The subcarrier assignment associated with the optimal bit allocation corresponding to the smallest P among the three will be the good enough solution of (2.3) T that we look for.

3.3 Test Results and Comparisons

In this section, we will test the performance of the proposed approach in the aspects of solution quality and computational efficiency. We will also compare with the existing subcarrier assignment and bit allocation algorithms such as the algorithm proposed by Wong et al. in [3], the linear programming approach proposed by Kim et al. in [5], the iterative algorithm proposed by Ergen et al. in [6], and Zhang’s approach in [8].

As depicted in Figure 2.1, we assume the OFDM system has 128 subcarriers (i.e.

=128

N ) over a 5 MHz band. The system uses M-ary quadrature amplitude modulation (MQAM) such that the square signal constellations 4-QAM, 16-QAM, and 64-QAM carry two, four, and six bits/symbol, respectively; therefore in this system D={0,2,4,6} and

=6

M . We adopt the approximate formula in (2.1) for the fk(c) in the transmission power

2,

) (

n k k c f

α shown in the objective function of (2.3).

Remark 3.8: The fk(c) in our hardware implementation is not limited to the formula given in (2.1), which is simply an example formula for the purpose of comparisons. However, we

have to admit that once a function or a form of fk(c), which may correspond to certain coding and modulation schemes, is assumed, changing hardware is not as easy as the software.

In all simulations presented in this section, we set Pe =104 for each user, and the wireless channel is modeled as a frequency-selective channel consisting of six independent Rayleigh multipaths. Each multipath is modeled by Clark’s flat fading model [25]. We assumed the delays and the corresponding gains of the six paths are 100 p⋅ nanosecond and

e2p(exponentially decay), respectively, where p =0, 1, 2, 3, 4 and 5 denote the multipath index. Hence, the relative power of the six multipath components are 0dB, -8.69 dB, -17.37 dB, -26.06 dB, -34.74 dB, and -43.43dB. We also assume the average subcarrier channel gain

2 ,n

Eαk is unity for all k and n . Based on the above assumptions, we generate power consumption coefficients αk,n ,k=1,...,K ,n=1,...,N, using MATLAB for our simulations.

We consider various number of users by setting K=2, 4, 8, 16, and 32. For each K, we randomly generate 500 sets of αk,n, k =1,...,K, n=1,...,N, based on the above mentioned power consumption coefficient generation process and denote αi as the ith set in the 500.

We assume a fixed total requested data rate R (=512 bits/symbol) and randomly generate T ,

,..., 1

, k K

Rk = based on the constraint T

K k

k R

R =

=1 . By the above test setup, we have run our approach for each K, each set of αk,n, k =1,...,K , n=1,...,N, and each set of

. ,..., 1

, k K

Rk = We also apply the four methods mentioned at the beginning of this section to the same test.

For each K associated with the data rate request Rk, k =1,...,K, we denote abSNRi)

as the average bit SNR4 when αi is used and calculate

500 ) (

500

1

= i

abSNRαi

, the average abSNR,

4 It is noted that the average required transmit power (in energy per bit) is defined as the ratio of the overall transmit energy per OFDM symbol, the PT in (2.1), to the total number of bits transmitted per OFDM symbol, which consists of 512 bits in our test case. Moreover, we define the average bit Signal-to-Noise Ratio (SNR) as the ratio of the average transmit power,

512

PT , to the noise PSD level N As we have assumed that all the data 0. rates per symbol are fixed at 512, and the N is just a constant, thus the 0 PT is proportional to the average bit SNR. Therefore, for the purpose of comparison, we can use the average bit SNR to replacePT.

resulted from the 500 αi's using our approach and the other four methods and report them in

Figure 3.4. We can see that the

500 ) (

500

1

= i

abSNRαi

obtained by our approach, which are marked by

"△" in Figure 3.4, is smallest among all methods. Moreover, the performance of our approach is even better as the number of users increases as can be observed from Figure 3.4.

0 5 10 15 20 25 30 35

17 17.5 18 18.5 19 19.5

Number of users in the system

Average abSNR

Our approach Wong et al.

Ergen et al.

Kim et al.

Zhang

Figure 3.4. The abSNR for K=2,4,8,16, and 32 obtained by the five methods.

In previous comparisons, we have set Pe =104. It would be interesting to know how will the QoS requirement affect the performance of our approach. Therefore, we have tested the performance of the five methods for various K and various P ranging from e 101 to 106 using randomly generated 500 sets of αk,n ,k =1,...,K ,n=1,...,128, for each K and each P . e The conclusions on the performance for various K are similar. A typical one is shown in Figure 3.5, which corresponds to K=7. The average of the 500 average bit SNR for various P obtained by our approach is marked by "△" in Figure 3.5. We can see that the e

performance of our approach is the best among the five, and when the QoS level is required higher (i.e. the value of P is smaller), the performance of our approach is even better (i.e. e smaller average of the average bit SNR compared with the other four methods).

Figure 3.5. Comparison of the performance of the five methods with respect to various Pe

for the case of K=7.

To investigate the computational efficiency of our approach and the other four methods, we need to report the average computation time for obtaining the abSNRi). However, as we have previously indicated that the DPG method will be implemented by integrated circuits, the computation time of our approach is partly real and partly estimated, and its details are stated in the following.

All the computation time of our OO theory based four-stage approach except for the DPG algorithm are recorded in the employed Pentium 2.4 GHz processor and 512 Mbytes RAM PC, and we denote it by T . For K=2, 4, 8, 16, and 32 in the test results shown in Figure 3.4, the R corresponding average T for obtaining an R abSNR(α are 1.214 ms, 1.686 ms, 3.386 ms, i) 5.366 ms, and 11.136 ms, respectively. To estimate the computation time of the DPG method, we base on 90-nm CMOS integrated circuit technology and denote this estimated computation

time as T . We let E, and ROM denote the operations of a multiplication, addition, and accessing the data of a ROM, respectively, and define T(⋅) as the computation time for performing the operation (⋅). Referring to the work of Hsu, S.K. et al. [26] and Kanan R. et al. [27], T =1.0ns for a 16x16 bit multiplication5, and it takes 1.2nsfor accessing the data of a ROM. In practical designs, the circuit complexity of a 16x16 bit multiplication is five times greater than a 16+16 bit addition [28, Ch. 5, p. 113, and Ch. 13, p. 433], thus we can set

ns.

≈ 0.2

T Then, based on the last column of Table 3.1, we have 1 6.6ns

PEn =

T ,

ns 2.2

2n

PE =

T , TPE4 =1.6ns, TPE5 =1.2ns, and TPE7 =1.0ns. In our simulations, the values of tmax× jmax are set to be 8000, 10000, 12000, 15000, and 18000 for cases of K =2, 4, 8, 16, and 32, respectively. Thus based on (3.15) and (3.16), the estimated computation time, T , E of the DPG method is 0.208 ms, 0.52 ms, 1.248 ms, 3.12 ms, and 7.488 ms for cases of K =2, 4, 8, 16, and 32, respectively. Summing up T and R T , the estimated average computation E time of our approach for obtaining an abSNR(α for various K are reported in the second i) row of Table 3.2. We also report the average computation time of the other four methods on the same test case in rows 3-6 of Table 3.2. The method proposed by Wong et al. is most computation-time consuming as have been indicated in [5]-[9]. Considering that the frame length of a wideband OFDM is 20 ms [29], the proposed approach can meet the real-time application requirement for high mobility circumstances.

Table 3.2

Average computation time (ms) for obtaining anabSNR(αi)for various number of users

Computation time (ms) K Method

2 4 8 16 32

Our approach 1.42 2.21 4.63 8.49 18.63

Wong et al. 103.32 185.3 371.3 701.2 1507.1

Ergen et al. 10.18 14.9 18.8 31.1 53.2

Kim et al. 24.95 30.5 40.6 96.6 225.9

Zhang 26.81 42.5 45.3 60.3 88.1

5 A question may be raised that whether 16-bit data type has enough precision for implementation. The answer is yes, because the resulted ρk*,n that we need from the hardware computation is whether ρk*,n is zero or nonzero but not how accurate the nonzero value is.

As demonstrated above, our approach outperforms the other four methods in the aspect of power consumption, and we can obtain the results in real-time.

Remark 3.9: It seems not fair that the computation time of our algorithm are partly estimated from hardware performance, while the computation time of other algorithms are entirely from the computer simulation. In fact, what we want to assert is we can achieve the best performance among all methods (the comparisons resulted in Figure 3.4) in real-time (the data shown in the second row of Table 3.2).

Remark 3.10: As we have indicated in Remark 3.2 that the DPG method is simple, hence it takes more iterations to converge. However, the key that it can help speed up our approach is its hardware implementability, and this is the reason why we estimate its computation time based on the hardware architecture rather than the commonly adopted expression.

Remark 3.11: In the deep submicron technology, the effect of the wire delay is prominent, especially in the design of large area or complicated routing. There are two types of wire delay in our hardware architecture, the intra and inter wire delay. The intra wire delay is the wire delay inside the hardware component of a PE, such as the multiplier. The inter wire delay is the wire delay between PEs and registers of the hardware architecture shown in Figure 3.3.

The intra wire delay plays a dominant role in the overall wire delay, however they had been taken into account in our estimation of computation time. Since our hardware architecture is very regular and modular, the inter wire delay can at most be a small fraction of T , the E estimated computation time of the DPG algorithm, and will not affect the real-time application.

To evaluate the actual goodness of the obtained good enough solutions, we should compare with the optimal solution of (2.3) using extensive simulations. To cover more system conditions in our simulations, we consider the cases of N=32, 64, and 128 and take four various K for each N. We define Average Bit Per Subcarrier (ABPS) as

N R

K k

k

=1 to denote the congestion condition of the system. We set M=6 and consider three cases of ABPS, ABPS=3, 4 and 5, for each N and each K. For each (N, K, ABPS), we randomly generate 250 sets of

, ,..., 1 ,k K

Rk = based on the constraint on ABPS, and randomly generate a set of

, ,..., 1 , ,..., 1

,n ,k K n N

k = =

α for each set of Rk ,k =1,...,K. We employ (2.1) for the f(c) but set Pe=104 and N0 =1. Table 3.3 shows the average of 250 ×100%

d D

d for each (N, K, APBS), where D and d denote the actual optimal power consumption of (2.3) and the power consumption of good enough solution obtained by our approach, respectively. For each ABPS, the average of the average ddD×100% of various N and K is shown in the last row of Table 3.3, which indicates the average deviation of d from D is around 1.0% in various congestion condition of the system. This shows that the good enough solutions we obtained are really good enough.

Table3.3

The average ddD×100% for each set of N,K and ABPS

Average ×100% d

D d

N K ABPS =3 ABPS =4 ABPS =5

32 4 0.246 0.300 0.283 32 6 0.646 1.260 0.874 32 8 1.606 1.634 1.115 32 10 1.837 1.778 1.568 64 4 0.198 0.114 0.131 64 8 0.557 0.422 0.241 64 12 1.872 1.663 1.295 64 16 2.227 2.328 2.090 128 4 0.056 0.081 0.080 128 8 0.131 0.110 0.144 128 16 0.501 0.863 0.852 128 32 2.572 2.842 2.793 Average 1.037 1.116 0.956

相關文件