Experiment Result - Experiment Results - 雙核心嵌入式系統之即時線上排程方法

Section 4 Experiment Results

4.2 Experiment Result

在這個章節我們將提出實驗數據，首先我們提出的數據是比較有加入

Preemption Point ( with preemption point, WP ) 和沒有加入 Preemption Point ( not with preemption point, NP )其 RDC 數值的平均值和標準差。基本上每一個實驗，

我們會針對三種不同的CUS server size 的配置方法進行比較，第一種是所有 Task 的CUS server size 相似(差距不大於 0.05)，第二種是 CUS server size 和 Task 的週期成反比，也就是週期愈大的Task 其 CUS server size 愈小。第三種是第二種是CUS server size 和 Task 的週期成正比，也就是週期愈大的 Task 其 CUS server size 愈大。為了方便觀察出數據的趨勢，我們讓週期隨著 Task 的 number 做遞增，

也就是Period of Task1<= Period of Task2<= Period of Task3<= Period of Task4。

在下列圖表的X 軸代表不同的 Task，每一個 Task 我們去比較有加入 preemption point (WP) 和沒有加入 preemption point ( NP )的差異。Y 軸代表每一個Task 其實驗結果 RDC 的平均值或是標準差。

由我們的實驗數據可以看到下列的結果：

1. 在 WP 的部份，DSP subtask 的 RDC 的平均值會隨著週期遞增，週期小的 Task(ex：Task1)其 DSP subtask RDC 平均值會低於週期大的 Task(ex：Task4)。

2. 在 NP 的部份，DSP subtask 的 RDC 的平均值會隨著週期遞減，週期小的 Task(ex：Task1)其 DSP subtask RDC 平均值會高於週期大的 Task(ex：Task4)。

3. WP 在 DSP subtask RDC 的標準差會小於 NP 的 DSP subtask RDC 的標準差。

4. 當一個 Task 其 CUS server 上升時，該 Task 的 DSP subtask 的 RDC 平均值將會下降。

5. WP 和 NP 在 MPU subtask 的 RDC 平均值基本上沒有差別。

CUS server size same(DSP)

Task1 Task2 Task3 Task4

DSP rdc avg

WP NP

CUS server size decrease(DSP)

1.0707

Task1 Task2 Task3 Task4

DSP rdc avg

WP NP

Fig. 15 DSP subtask RDC AVG - same CUS server size

Fig. 16 DSP subtask RDC AVG - decreasing CUS server size

CUS server size increase(DSP)

Task1 Task2 Task3 Task4

DSP rdc avg

WP NP

Fig. 17 DSP subtask RDC AVG - increasing CUS server size

CUS server size same(DSP)

0.1158 0.2444 0.3407 0.4819

2.3730

Task1 Task2 Task3 Task4

DSP rdc std

WP NP

Fig. 18 DSP subtask RDC STD - same CUS server size

CUS server size decrease(DSP)

0.0805

Task1 Task2 Task3 Task4

DSP rdc std

WP NP

Fig. 19 DSP subtask RDC STD - decreasing CUS server size

CUS server size decrease(MPU)

1.0213 1.0779 1.0909 1.1107

1.0218 1.0754 1.0906 1.1040

0.0000

Task1 Task2 Task3 Task4

MPU rdc avg

WP NP CUS server size same(MPU)

1.0086 1.0449 1.0691 1.0802

1.0072 1.0262 1.0570 1.0689

0.0000

Task1 Task2 Task3 Task4

MPU rdc avg

WP NP CUS server size increase(DSP)

0.2239 0.2005 0.2925 0.3132

5.2495

Task1 Task2 Task3 Task4

DSP rdc std

WP NP

Fig. 20 DSP subtask RDC STD - increasing CUS server size

Fig. 21 MPU subtask RDC AVG - same CUS server size

Fig. 22 MPU subtask RDC AVG - decreasing CUS server size

CUS server size increase(MPU)

1.0057 1.0269 1.0497 1.0633

1.0010 1.0127 1.0447 1.0739

0.0000

Task1 Task2 Task3 Task4

MPU rdc avg

WP NP

CUS server size decrease(MPU)

0.1033

Task1 Task2 Task3 Task4

MPU rdc std

WP NP CUS server size same(MPU)

0.0460

Task1 Task2 Task3 Task4

MPU rdc std

WP NP

Fig. 23 MPU subtask RDC AVG - increasing CUS server size

Fig. 24 MPU subtask RDC STD – same CUS server size

Fig. 25 MPU subtask RDC STD – decreasing CUS server size

由Fig.15,16,17 這三張圖表我們可以看出，WP 的 DSP subtask RDC 平均值會隨著Task number 呈現遞增的現象，NP 的 DSP subtask RDC 平均值會隨著 Task number 呈現遞減的現象。這個原因在於因為週期大的 Task 通常具有較大的 computation time，所以在 WP 的情形下，Task number 小的 Task 當他進入 DSP 時，通常可以得到較小的local deadline，加上有加入 preemption point，所以當時間點抵達preemption point 的時候，容易就會發生 preemption，使得 Task number 小的Task 總是可以比較快完成，所以 RDC 值會比較小，因此 RDC 的平均值是隨著Task 的週期做遞增。在 NP 的情形下，因為不加入 preemption point，所以當一個Task τ1進入DSP 時，如果已經有其他的 Task 在使用 DSP，就必須等到該 Task 執行完畢，τ1才可以開始在DSP 執行。對於週期較小的 Task 1 而言，當他進入DSP 被 block 時，付出的代價會很大，這是因為正在 DSP 執行的 Task 其執行時間很大，只要Task 1 被擋住，response time 增加的時間相較於 Task 1 的 computation time 來說相當的大，所以 RDC 的數值就會提高很多。但是對於週期較大的Task 4 而言，當他進入 DSP 被 block 時，RDC 付出的代價比較小，因為

(1.135->1.0707)，但是 Task 4 付出了代價（2.5093->3.2522）。當Task 1CUS server size 變小，由 Fig. 17 可以看出，Task 1 的 response 變差(1.135->1.2050)，，但是 Task 4 response 變好了（2.5093->2.0410）。由這個結果我們可以看出當週期小的Task 擁有較大的 CUS server size 時，比較能夠搶到 DSP，response 效果較好，

CUS server size increase(MPU)

0.0333

Task1 Task2 Task3 Task4

MPU rdc std

WP NP

Fig. 26 MPU subtask RDC STD – increasing CUS server size

不過週期大的Task 必須付出代價，要不是搶不到 DSP，就是在 DSP 執行到一半，

就被週期小的Task preempt，response 效果較差。事實上，Fig. 15,16,17 NP 的部份也有這樣的趨勢。

在Fig. 18,19,20 可以看到有加入 preemption point 的 DSP subtask RDC 標準差相較於沒有加入preemption point 的 RDC 標準差會小很多，這代表有加入 preemption point 的 case，每一個 DSP subtask 其 response 的時間差異性不會很大，不會忽快忽慢，response 比較能夠預期，這在排程上是很重要的事情。

在Fig. 21,22,23 可以看出不管有沒有加入 preemption point，MPU subtask 的 RDC 平均值幾乎都是 1，這是因為 MPU subtask 的 computation time 較短，在 MPU 較不容易發生 resource contention，每一個 MPU subtask 進入 MPU 幾乎可以立刻開始執行。同時，在Fig. 24,25,26 可以看出 MPU subtask 的 RDC 標準差很小，Task 的 RDC 變異性不大。

Section 5 Conclusion

這分研究最主要的目的是在滿足precedence constraint 的條件下，降低每一個Task 在 DSP 的 response delay：藉由在 DSP 的排程加入 preemption point，使得最urgent 的 Task ( Task with smallest deadline )可以優先被排程，減少在 DSP pending 的時間，避免 miss deadline，同時提高系統的可排程性( schedulability )。

藉由我們的數據分析中可以看出，在加入Preemption Point 的條件下，週期小的 task( always comes with small deadline)的 RDC 的平均值及標準差會比沒有加入 Preemption Point 小很多，代表 response delay 有所下降，而且變異性不大。利用我們設計在MPU 及 DSP 的排程方法，除了可以滿足 precedence constraint，更可以預測每一個subtask 的 worst response time。此外，我們所提出在 MPU 及 DSP 的 admission control 方法，可以檢查加入一個 on-line Task 系統是否存在一個 feasible schedule。

References

• [1] R.M. Ramanathan,＂Intel Multi-Core Processors Making the Move to Quad-Core and Beyond＂, Intel Corporation.

• [2] Robert Oshana, DSP Software Development Techniques for Embedded and Real-Time Systems, Newnes, USA, 2006, p.xii

• [3] S.Baruah, J. Goossens, and G.Lipari. Implementing constant-bandwidth servers upon multiprocessor platforms. In Proc. 8th IEEE Real-Time and Embedded

Technology and applications Symposium, pages 154-163, San Jose, CA, September 2002.

• [4] ARM926EJ-S, http://www.arm.com/products/CPUs/ARM926EJ-S.html

• [5] Jennifer Eyre, Jeff Bier, ＂The Evolution of DSP Processors＂, Berkeley Design Technology, Inc.

• [6]C. Lee, M. Potkonjak, and W.Wolf. System-level synthesis of application specific systems using a* search and generalized force-directed heuristics. In Proceedings of the 9th Intern. Symposium on System Synthesis, 1996.

• [7] H. Oh and S. Ha. Memory-optimized software synthesis from dataflow program graphs with large data samples.

• EURASIP Journal on Applied Signal Processing, pages 514–529, 2003.

• [8]Y. Cho et al. Scheduler implementation in MP SoC design. In Proceedings of the conference on Asia South Pacific Design Automation Conference, 2005.

• [9] L.-F. Leung, C.-Y. Tsui, and W.-H. Ki. “Minimizing energy consumption of multiple-processors-core systems with simultaneous task allocation, scheduling and voltage assignment.＂ In ASPDAC, 2004.

• [10] M. Ruggieroy et al. “Communication aware allocation and scheduling framework for stream oriented multi-processor systems on chip.＂ In DATE ,2006.

• [11] Ya-Shu Chen et al. “Dynamic Task Scheduling and Processing Element Allocation for Multi-Function SoCs＂

Appendix

1. 針對 JPEG Encoding 的幾個步驟包含 FDCT, Quantization, DPCM, Encode 這四個步驟，我們在X86 的系統上，利用 RDTSC 這個指令去計算這四個步驟執行時間的比例關係：

FDCT Quantization DPCM Encode

Clock cycles 7752 7514 153 12818

Table:3 The clock cycles of each step in JPEG encoding

2. 我們的平台上的 D-cache 是 4-way associative，D-cache 大小為 16KB，type 為 write back，replacement policy 為 Round Robin。我們去比較 cache 有經過污染和沒有經過污染在cache hit/miss 次數以及 CPU 執行 cycles 的差別。

在Fig. 27 的程式碼可以看到當 a2 陣列讀取 a1 陣列後，a3 再次去讀取 a1，此時 a1 的內容仍然存在於 cache 之中，所以 a3 可以直接在 cache 裡頭讀取 a1 陣列的內容。在Fig. 28 的程式碼可以看到當 a2 陣列讀取 a1 陣列後 a5 去讀取 a4 陣列的內容，因為a4 的內容很大，a1 在 cache 之中的內容會被 replace，當 a3 要去讀取a1 陣列時，會發生 cache miss，需要去 memory 裡頭重新 load data。

Cache is not polluted Cache is polluted Data cache read hit 1704 732 Data cache read miss 2043 2164 Number of core clocks 738964 739688

Table 4: The comparison between cache is polluted or not.

在文檔中雙核心嵌入式系統之即時線上排程方法 (頁 24-32)