PPE 端程式的修改 - 自訂資料結構及程式的撰寫

第四章 MPEG-4 解碼的實現

4.2 自訂資料結構及程式的撰寫

4.2.2 PPE 端程式的修改

要在原程式中使用 SPE 的子程式，並達到平行處理程式的效果，就必須使用執行緒 (Thread)，簡言之，當一個 PPE 的主程式執行時，可以建立多個 PPE 的執行緒，

而每個執行緒又對應到一個 SPE 的執行緒，如圖 59 所示。由於 SPE 的執行緒是作

圖 59. Cell BE 執行緒示意圖 PPE 主程式 PPE 執行緒

PPE 執行緒

SPE 執行緒 SPE 執行緒

SPE 執行緒 SPE PPE

用在獨立的 CPU 上，CPU 之間的運作效能不會互相影響到，所以稱之為平行處理，

parm_addr ctx[4] __attribute__((aligned(128)));

void *spu_pthread(void *arg){

spu_data_t *input = (spu_data_t *)arg;

unsigned int entry = SPE_DEFAULT_ENTRY;

spe_context_run( input->speid ,&entry,0, input->argp ,NULL,NULL);

pthread_exit(NULL);

}

int decoder_create(xvid_dec_create_t * create) {

. . .

spu[i].argp = (void*)&ctx[i];

pthread_create(&spu[i].pthread,NULL, &spu_pthread , &spu[i] );

. . . }

函數指標傳入參數

辨別 SPE 的變數儲存 parm_addr 結構變數的位址

緒時的 speid 以及 argp 變數，有了 argp 所儲存的絕對位址，並透過 DMA 傳輸將 PPE 端的 ctx [0] 內的位址資訊複製至 SPE1 端 (ctx [1] ~ [3] 就是儲存 SPE2 ~ SPE4 相關的位址資訊)，有了這些位址資訊，多核心間的資料傳輸以及通訊問題就解決了。

圖 61. 利用傳入參數取的 parm_addr 的結構資料

2. 位址的初始設定 ( parm_addr 資料結構內的成員設定 )

在解碼初始化函數 decoder_create() 中，還包含了位址的初始設定之程式碼，且位址資訊的初始化可分為兩類，SPE1 以及 SPE2 ~ SPE4，而詳細的 parm_addr 自訂結構變數設定如表 12 所示，由表中可知，只有 SPE1 完整知道 PPE 端存放係數陣列以及自訂結構變數的位址，而 SPE2 ~ SPE4 必須從前一個 SPE 的 LS 起始位址，加上 2.3.2 提到的資料相對位址偏移，方可獲得前一個 SPE 的係數陣列以及自訂結構變數之絕對位址，所以 SPE2 ~ SPE4 只存下前一個 SPE 的 LS 起始位址，而圖 62 是以 SPE2 要取得 SPE1 之資料的絕對位址為例的程式內容，下一段將對這些流程作詳細說明。

表 12. 位址的設定

SPE ID member description

SPE1 ea_parm PPE 端的 parm_context 結構變數位址 ea_base PPE 端的 data_B 之陣列位址

ea_sig1 SPE2 的 SNR1 位址

ea_sig2 沒用處

SPE2 ~ SPE4 ea_parm 沒用處

ea_base 前一個 SPE 的 LS 起始位址 ea_sig1 後一個 SPE 的 SNR1 位址

ea_sig2 前一個 SPE 的 SNR2 位址 (SPE4 除外) spe1.c

parm_addr ctx __attribute__((aligned(128)));

int main( speid , argp , envp ){

. . .

mfc_get(&ctx,argp,sizeof(ctx),tag_id[0],0,0);

. . . }

圖 62. SPE2 ~ SPE4 取得位址之部份程式碼 (以 SPE1 和 SPE2 為例)

parm_addr ctx __attribute__((aligned(128)));

uint32_t ea_offset ;

uint64_t ea_data , ea_parm ; . . .

short int data_A[6][64] , data_B[6][64];

parm_context parm_A, parm_B;

spu_write_out_mbox((uint32_t)data_B); // SPE1 係數陣列的 offset spu_write_out_mbox((uint32_t)&parm_B); // SPE1 自訂結構變數的 offset

spe1.c decoder.c

parm_addr ctx[4] __attribute__((aligned(128)));

spe_context_ptr_t speid[2]; // [0] 代表 SPE1 , [1] 代表 SPE2 uint64_t ea_ls_base;

uint32_t ls_offset;

ea_ls_base = (uint32_t)spe_ls_area_get(speid[0]); // SPE1 的 LS 起始位址 ctx[1].ea_base = ea_ls_base;

. . .

while( spe_out_mbox_status(speid[0])==0 );

spe_out_mbox_read(speid[0],&ls_offset,1); // SPE1 係數陣列的 offset spe_in_mbox_write(speid[1],&ls_offset,1,1); // 傳給 SPE2

while( spe_out_mbox_status(speid[0])==0 );

spe_out_mbox_read(speid[0],&ls_offset,1); // SPE1 自訂結構變數的 offset spe_in_mbox_write(speid[1],&ls_offset,1,1); // 傳給 SPE2

儲存 SPE1 ~ SPE4 相關的位址資訊

(3) 從 SPE1 利用 Mailbox 傳送相對位址偏移 (offset) 至 PPE 端。

static void decoder_mbintra(dec, pMB, acpred_flag, cbp, bs,quant){

for (i = 0 ; i < 6 ; i++) {

uint32_t iDcScaler = get_dc_scaler(quant, i<4);

predict_acdc( .., predictors ,.. );

if (cbp & (1 << (5-i) ) )

get_intra_block(bs,&data_A[i*64],direction, .. );

add_acdc( .., &data_A[i*64], iDcScaler, predictors, .. );

}

// 當 SPE1 的輸出信箱不是空的，代表已經做完之前的工作，就指派 SPE1 去做 if( spe_out_mbox_status(spu_data[0].ctx)!=0 ){

spe_out_mbox_read(spu_data[0].ctx,&status,1);

parm.xxx = xxx; // 儲存一些必要的參數，結構如表 8 所示 memcpy( data_B,data_A,6*128 );

status = 63; // 表示六個 block 均要進行運算

spe_in_mbox_write(spu_data[0].ctx,&status,1,1);

}else{

for(i = 0 ; i < 6 ; i++){

iDcScaler = get_dc_scaler(quant, i<4);

dequant_h263_intra(.,&data_A[i*64],quant,iDcScaler,.);

idct((short * const)&data_A[i*64]);

}

transfer_16to8copy(pY_Cur ,&data_A[0*64] ,stride); // 呼叫六次 . . .

出信箱有信息，代表 SPE 有空閒，如果沒信息，表示 SPE 尚在作業中。

static void decoder_mb_decode (dec, cbp, bs, pY_Cur, pU_Cur, pV_Cur, pMB) { spe_in_mbox_write(spu_data[0].ctx,&cbp,1,1);

}else{

for(i = 0 ; i < 6 ; i++){

if (cbp & (1 << (5-i) ) ){

get_inter_block(bs,&data_A[i][0],direction,...);

idct((short * const)&data_A[i][0]);

transfer_16to8add(dst[i],&data_A[i][0],strides[i]);

在文檔中多核心管狀視訊解碼運算 (頁 64-70)