2D cache line coordinate
Row Width
Each region is further divided into N sub-region, and each sub-region use 1 bit to indicate whether it is accessed by a warp or not
VHQWHG E\ D UHJLRQ ELWYHFWRU ZLWK 0 ELWV 7KHQ HDFK WKUHDG EORFN FRXOG JHW D 0ELW UHJLRQ YHFWRU E\ PDSSLQJ LWV PHPRU\ DFFHVV UDQJH WR WKH GDWD DUUD\ $V VKRZQ LQ ILJXUH WKH GDWD DUUD\ LV SDUWLWLRQHG LQWR 24UHJLRQ ,I WKH DFFHVV UDQJH RI D EORFN LV IDOOHQ LQWR WKH JUHHQ ER[ UHJLRQ WKH ELW UHJLRQ YHFWRU EHFRPHV
6WHS (DFK UHJLRQ LV IXUWKHU SDUWLWLRQHG LQWR 1 VXEUHJLRQ ZKHUH HDFK VXEUHJLRQ LV UHS
UHVHQWHG E\ D VXEUHJLRQ ELWYHFWRU ZLWK 1 ELWV 7KHQ WKH ZDUS FRXOG JHW D 1ELW VXEUHJLRQ YHFWRU E\ PDSSLQJ LWV PHPRU\ DFFHVV UDQJH WR WKH VXEUHJLRQ $V VKRZQ LQ ILJXUH HDFK UHJLRQ LV SDUWLWLRQHG LQWR VXEUHJLRQ (DFK VXEUHJLRQ XVHV
ELW WR LQGLFDWH ZKHWKHU LW LV DFFHVVHG E\ WKH ZDUS RU QRW ,I D ZDUS DFFHVVHV WKH VXEUHJLRQV ZLWK WKH EOXH ER[WKH XSSHUOHIW DQG ORZHUULJKW VXEUHJLRQV WKH ELW VXEUHJLRQ YHFWRU EHFRPHV
&RPELQH WKH UHJLRQ YHFWRU RI WKH EORFN DQG VXEUHJLRQ YHFWRU RI WKH ZDUS WKH DFFHVV UDQJH RI D ZDUS FDQ EH UHSUHVHQWHG DV D ELWYHFWRU ZLWK 0OHQJWK RI UHJLRQ ELWYHFWRU 1OHQJWK RI VXEUHJLRQ ELWYHFWRU ELWV
7ZROHYHO :DUS 6FKHGXOHU
,Q RUGHU WR FDSWXUH WKH LQWHUEORFN ORFDOLW\ DW ZDUSOHYHO ZH VKRXOG SXW WKH ZDUSV ZLWK LQWHUZDUS ORFDOLW\ WRJHWKHU DQG WKHQ H[HFXWH WKRVH ZDUSV URXJKO\ DW WKH VDPH WLPH VXFK WKDW VKDUHG FDFKH OLQHV FRXOG EH XVH DV PDQ\ WLPHV DV SRVVLEOH EHIRUH WKH\ JHW HYLFWHG
%DVHG RQ WKH DERYH WKRXJKW ZH HPSOR\ ZLGHO\ XVHG WZROHYHO ZDUS VFKHGXOHU WR GH
YHORS RXU ORFDOLW\DZDUH ZDUS VFKHGXOHU )LJXUH VKRZV RXU SURSRVHG ORFDOLW\DZDUH
doi:10.6342/NTU201602569
W W W W W Pending group
W W W W
Active group
Promote a warp having the most inter-warp localitywith other warps in active group
ƒ Long latency operation
(off-chip memory access)
‚
Greedyto execute a warp until short stall then locality select the next issued warp
)LJXUH /RFDOLW\ DZDUH ZDUS VFKHGXOHU
ZDUS VFKHGXOHU ,Q WKH 60 DQ DGGLWLRQDO ZDUS TXHXH LV LQWURGXFHG WR VWRUH WKH DFFHVV UDQJH RI WKH UXQQLQJ ZDUSV RQ WKH 60 DV ZHOO DV WKH LQWHUZDUS ORFDOLW\ EHWZHHQ ZDUSV
7KH DFFHVV UDQJH LQIRUPDWLRQ RI ZDUSV LV XSGDWHG E\ WKH EORFN VFKHGXOHU DQG WKH LQWHU
ZDUS ORFDOLW\ EHWZHHQ ZDUSV LV FRPSXWHG E\ WKH ZDUS VFKHGXOHU GXULQJ H[HFXWLRQ 7KH LGHD RI WKH SURSRVHG WZROHYHO ZDUS VFKHGXOHU LV WR GLYLGH DOO WKH UXQQLQJ ZDUSV LQ WKH 60 LQWR WZR ZDUS JURXSV DFWLYH JURXS DQG SHQGLQJ JURXS ,Q HDFK F\FOH D ZDUS IURP WKH DFWLYH JURXS LV LVVXHG WR WKH 6,0' ODQH IRU H[HFXWLRQ LQ WKH SULRULW\ RUGHU RI JUHHG\
WKHQ ORFDOLW\ 2QFH DQ\ ZDUS VXIIHUV D ORQJ ODWHQF\ VWDOO LH RIIFKLS PHPRU\ DFFHVV
WKH ZDUS LV GHPRWHG WR SHQGLQJ JURXS $W WKH VDPH WLPH RQH RI UHDG\ ZDUS ZKLFK KDV WKH KLJKHVW VKDULQJ GHJUHH ZLWK WKH ZDUSV LQ WKH DFWLYH JURXS LV SURPRWHG IURP SHQGLQJ JURXS WR DFWLYH JURXS
6LQFH ZH DOZD\V SURPRWH ZDUSV ZLWK ORFDOLW\ IURP SHQGLQJ JURXS WR DFWLYH JURXS
VWDUYDWLRQ PD\ RFFXU ZKHQ VRPH ZDUS QDWXUDOO\ KDV QR ORFDOLW\ ZLWK RWKHU ZDUSV 2QFH D ZDUS VWDUYHV WKH RWKHU ZDUSV ZLWKLQ WKH VDPH WKUHDG EORFN FDQ QRW OHDYH 60 XQWLO WKH VWDUYHG ZDUS LV ILQLVKHG OHDGLQJ WR SHUIRUPDQFH GHJUDGDWLRQ 7R WDFNOH VWDUYDWLRQ LVVXH
D VLPSOH WLPHRXW VROXWLRQ LV DGRSWHG (DFK WKUHDG EORFN LV JLYHQ DQ DJH ZKHQ LW LV DV
VLJQHG WR WKH 60 :H GHWHFW WKH VWDUYDWLRQ KDSSHQHG ZKHQ Agenew−dispatched−threadblock Agecurrent−threadblock! 1 ZKHUH 1 LV WKH PD[ QXPEHU RI WKUHDG EORFN LQ WKH 60 2QFH
doi:10.6342/NTU201602569
)LJXUH )ORZFKDUW RI WKH WZROHYHO ZDUS GLVSDWFKLQJ GHFLVLRQ
VWDUYDWLRQ LV GHWHFWHG WKRVH VWDUYHG ZDUSV DUH VHYHUHG LQ WKH ILUVW SULRULW\
7KH IORZFKDUW RI WKH WZROHYHO ZDUS GLVSDWFKLQJ GHFLVLRQ LV VKRZQ LQ ILJXUH )LUVW
ZDUS VFKHGXOHU ZRXOG JUHHGLO\ VHOHFW WKH VDPH ZDUS LQ WKH DFWLYH JURXS IRU H[HFXWLRQ XQWLO LW VXIIHUV D VWDOO ,I WKH VWDOO LV D VKRUW VWDOO VXFK DV SLSHOLQH VWDOOV WKH ZDUS VFKHGXOHU ZRXOG VHOHFW D ZDUS ZKLFK KDV WKH KLJKHVW LQWHUZDUS ORFDOLW\ ZLWK ODVW LVVXHG ZDUS IURP WKH DFWLYH JURXS IRU H[HFXWLRQ 2WKHUZLVH WKH ZDUS LV VWDOOHG E\ D ORQJ ODWHQF\ RSHUDWLRQ DQG LV GHPRWHG WR WKH SHQGLQJ JURXS $W WKH VDPH WLPH D ZDUS KDV WKH KLJKHVW LQWHUZDUS ORFDOLW\ ZLWK DOO ZDUSV LQ WKH DFWLYH JURXS LV SURPRWHG IURP WKH SHQGLQJ JURXS WR WKH DFWLYH JURXS
7KH LQWHUZDUS ORFDOLW\ GHVFULEHG LQ WKH DERYH LV NHSW LQ D ORFDOLW\ GHJUHH WDEOH DV VKRZQ LQ ILJXUH (DFK HQWU\ LQ WKH ORFDOLW\ GHJUHH WDEOH UHSUHVHQWV WKH LQWHUZDUS ORFDOLW\ RI WKH FRUUHVSRQGLQJ WZR ZDUSV )RU H[DPSOH LQWHUZDUS ORFDOLW\ EHWZHHQ ZDUS
DQG ZDUS LV VWRUHG LQ WKH HQWU\ 7KH LQWHUZDUS ORFDOLW\ EHWZHHQ WKH WZR ZDUSV FDQ EH FRPSXWHG E\ FRPSDULQJ WKHLU ZDUS DFFHVV UDQJH LQIRUPDWLRQ DV IROORZLQJ
&KHFN ZKHWKHU WKH UHJLRQYHFWRU EHWZHHQ WKH WZR ZDUSV DUH WKH VDPH LI WKH\ KDYH GLIIHUHQW UHJLRQYHFWRU WKHUH LV QR LQWHUZDUS ORFDOLW\ DPRQJ WKHP $V VKRZQ LQ
doi:10.6342/NTU201602569
Warp 0 (001-1011) Warp 1 (001-0011) Warp2 (010-1111) …
Warp 0 2 0
Warp 1 0
Warp 2
Warp access range information
inter-warp locality between warp 0 and warp 1 Region bit-vector Subregion bit-vector
)LJXUH ([DPSOH RI WKH /RFDOLW\ GHJUHH WDEOH
Warp 0 access range:
Warp 1 access range:
1001-001001
1001-001001
Warp 2 access range: 1011-001001
Region vector Subregion vector
Same region vector: inter-warp locality = 2
Different region vector : inter-warp locality = 0 )LJXUH ([DPSOH RI LQWHUZDUS ORFDOLW\ FRPSXWDWLRQ
ILJXUH ZDUS DQG ZDUS KDYH GLIIHUHQW UHJLRQYHFWRU VR WKHUH LV QR LQWHU
ZDUS ORFDOLW\ EHWZHHQ WKHP
2WKHUZLVH WKH LQWHUZDUS ORFDOLW\ LV WKH QXPEHU RI VDPH ELW LQ WKH VXEUHJLRQ
YHFWRU $V VKRZQ LQ ILJXUH ZDUS DQG ZDUS KDYH WKH VDPH UHJLRQYHFWRU
VR WKH LQWHUZDUS ORFDOLW\ EHFRPHV EHFDXVH WKHUH DUH RI WKH VDPH ELW LQ WKH VXEUHJLRQYHFWRU
doi:10.6342/NTU201602569