Chapter 5: Summary, Discussion, and Conclusion
5.6 Reducing the coding delay effect on pipeline stage:
DAT:
Looking up table can be done before the discontinuous address occurs.
¾ IA(t) : instruction address at time t
¾ Time = t
Look up IA(t) in source address entries of DAT 9 Case 1 : IA(t) is in source address entries of DAT
Extract the destination address of IA(t) in DAT.
9 Case 2 : IA(t) is not in source address entries of DAT IA(t+1) cannot utilize DAT to reduce Bus transactions.
¾ Time = t + 1 (Case 1)
Comparing this destination address with IA(t+1)
9 Case 1 : IA(t+1) is the same with the destination address Sending a specific signal instead of sending IA(t+1).
9 Case 2 : IA(t+1) is not the same with the destination address IA(t+1) cannot utilize DAT to reduce Bus transactions.
Q&A Q1:
The proposed methods are many and diverse. What is the main idea of the narrow Bus encoding? (chapter 3.1)
A:
Reduce transmitted bytes of data as long as receiver can get exact data.
VL-Encoder :
¾ The sender can skip the leading regular bytes of data without sending, and inform the receiver what the regularity is.
9 Using additional information to inform the sender what the regularity is.
9 Using additional information to inform the sender the number of Bus transactions.
VL-Encoder :
¾ Increasing the leading regular bytes in each 32-bits data.
Q2:
When considering the energy consumption, the bit toggles play an important role. Though the purpose of this thesis isn’t on low energy, how is effect of the narrow Bus encoding on bit toggles?
A:
The proposed methods are designed without considering bit toggles.
System static energy consumption:
The narrow Bus encoding can reduce Bus transaction.
Fewer Bus transaction => Shorter program execution time
Shorter program execution time => Lower system static energy consumption
The bit toggles of transmitted bytes:
Coding methods which have good effect:
¾ Some of the coding methods use a specific signal to inform the receiver the transmitted data and see the Bus value as “don’t care”.
=> The Bus value can be set to reduce bit toggles except the control line.
Coding methods which have unpredictable effect:
¾ Some of the coding methods don’t consider the bit toggles. The effect on bit toggle is unpredictable.
Q3:
The simulations are basing on the traces extracted by O1 compiler. How about the effect on the O3 or other level compiler?
A:
My research focuses on the streams characteristic. I don’t consider the difference between O1 and O3 compiler.
Q4:
No matter what stream, you propose several coding methods. What method you suggest to use in each stream? (chapter 1.3)
A:
The product of Bus width and Bus transaction ratio
When the Bus width is smaller than data width and the Bus width is the power of 2, the Bus width and the Bus transaction ratio is inverse proportion. I use the product of Bus width and Bus transaction ratio as evaluation matrix.
The commended method of each stream evaluated by this function
Instruction Address
¾ T0-C + EB
¾ 1 extra control line
¾ I don’t consider the area overhead of NBDAT table and the “2 control line method” has gotten worse ratio. If the “2 control lines method” considers the area overhead of NBDAT, the product ratio will be worse.
Coding method Bus transaction ratio 1 control line UT0-C + EBU U26.53%U
2 control lines T0-C + NBDAT(unlimited) + EB 25.13%
Data Address
¾ No control line variable stride algorithm + historical addresses algorithm (with 6 historical address entries) + EB
¾ 3 extra control lines
Coding method Bus transaction
ratio 1 control
line
No control line variable stride algorithm + EB
46.42%
2 control lines
No control line variable stride algorithm + historical addresses algorithm (with 2 historical address entries) + EB
29.93%
3 control lines
UNo control line variable stride algorithm
U+ historical addresses algorithm (with 6 historical address entries) + EBU
U26.56%U
… … …
8 control lines
No control line variable stride algorithm + historical addresses algorithm (with 254 historical address entries) + EB
25.01%
Instruction
¾ Using “MARK” to indicate whether this instruction is in IDT or not
¾ 0 extra control line
Coding method Bus transaction
ratio 0 control
line
UUsing “MARK” to indicate whether this instruction is in IDT or notU
U26.65%U
1 control line
Using “EB” to indicate whether this instruction is in IDT or not
26.23%
Data
¾ utilize both insignificant bit and repeated bit + EB
¾ 2 extra control lines
Coding method Bus transaction
ratio 1 control
line
utilize repeated bit + EB 55.74%
2 control lines
Uutilize both insignificant bit and repeated bit + EBU
U45.49%U
These methods need different number of extra control lines and get different Bus transactions reduction. There may be systems cannot tolerate the area overhead of suggest method. They can choose the suitable coding method from the tables.
Q5:
You have an assumption that the area overhead of extra coding logic gates is much slighter than the extra control lines. Is this assumption reasonable?
(chapter 1.3)
A:
Coding logic gates area depends on:
Processing technology
Extra control lines area depends on:
Processing technology
Routing length
The needed coding logic gates and extra control lines are provided. The system designer can evaluate easily.
My thesis environment focuses on the external Bus between processor and memory. The assumption that the area overhead of extra coding logic gates is much slighter than the extra control lines is reasonable.
Q6:
There are many coding methods can inform the receiver end the number of Bus transactions. Why do you use the method which adds an extra control line to indicate the number of Bus transactions? (chapter 3.2.1, 3.2.2, 3.2.3)
A:
If I want to inform the receiver end the number of Bus transactions, I have to transmit extra information. There are two main directions to reach this purpose.
Policy 1 :
Using an extra control line (EB) to indicate a codeword is transmitted over or not at every Bus transaction.
Policy 2 :
Adding several extra bits to indicate the number of Bus transactions.
The reason to use policy 1:
In order to compare these two policies in the same standard, I use the same Bus width (including control lines) for both policies to compare the Bus transaction ratio of these two methods.
¾ The policy 1 can get lower Bus transaction ratio.
Q7:
You seem to reduce the number of Bus transaction to a very low degree.
However, why do you think these ratios are low enough? (chapter 5.8)
A:
Instruction Address
Data Address Instruction Data
0 control line U26.65%U
1 control line U26.53%U 46.42% U26.23%U 55.74%
2 control lines U25.13%U 29.93% 45.49%
3 control lines U26.56%U 8 control lines U25.01%U
Instruction Address Stream:
Sequential addresses (T0, T0-C)
The number of Bus transaction can be reduced to 1.
Non-sequential addresses pairs (NBDAT)
¾ When executing a branch instruction, it will branch from source address to target address. The source address and target address is a non-sequential addresses pair.
¾ First occurrence non-sequential addresses pair:
The number of Bus transactions is not guaranteed to be 1.
¾ After second occurrence non-sequential addresses pair:
The number of Bus transactions can be reduced to 1.
The Bus transactions of those first occurrence non-sequential addresses pairs can be reduced by EB, but they are not guaranteed to be 1. Only these addresses may cause more than 1 Bus transaction.
Data Address Stream:
Sequential addresses (Variable stride algorithm)
¾ The number of Bus transaction can be reduced to 1.
Locality properties (Historical addresses --- Described in section 3.4.2 encoder 2 ~ encoder 4)
¾ The historical addresses algorithm will record the occurred 1-byte-address, 2-byte-address, and 3-byte-address.
¾ After all 3-byte-addresses in this program are recorded; all data addresses need only 1 Bus transaction.
9 Due to test patterns and program size, 254 entries historical addresses table can record all 3-byte-addresses in my simulation program (Mibench).
Initialization
¾ Variable stride algorithm:
9 Before get right stride value, some data addresses need more than 1 Bus transaction.
9 When updating right stride value, EB can help to reduce the Bus transactions, but they are not guaranteed to be 1.
¾ Historical addresses algorithm:
9 Before all 3-byte-addresses in this program are recorded, some data addresses need more than 1 Bus transaction.
9 When initializing historical addresses table, EB can help to reduce the Bus transactions, but they are not guaranteed to be 1.
9 When initializing historical addresses table, 2-byte-address and 1-byte-address recorders can help to reduce the Bus
transactions, but they are not guaranteed to be 1.
The Bus transactions of those not utilized by variable stride and not recorded in 3-byte-address table addresses can be reduced by EB, but they are not guaranteed to be 1. Only these addresses may cause more than 1 Bus
transactions. Besides, the table size and control lines cannot be unlimited. The effect of proposed method will be further limited.
The proposed methods can reduce the Bus transactions of almost all data addresses to 1. The situation that the address need more than 1 Bus transaction is when the table or stride value is under initialization. I consider that the proposed method can reduce the Bus transactions low enough.
Instruction Stream:
Lower Bus transaction ratio than other streams when using the same number of control lines:
¾ Even if there is no extra control line, the Bus transaction ratio of instruction stream is about the same with instruction address stream using 1 extra control line. And it is about the same with data address stream using 3 extra control lines.
Some surveyed but not suitable compression methods
¾ VLC coding is not suitable in my environment
The VLC coding can be used for compressing a whole program but it is not suitable to compress instructions one by one. However, the
“Selective Instruction Compression for Memory Energy Reduction in Embedded Systems” is very suitable to compress instruction one by one.
¾ Compressing operands is not suitable in my environment Though some operands can be compressed to very small, some operands may be still very large. The Bus transactions are the ceiling of the quotient of the compressed size divided by 8. It doesn’t
consider this and get lower Bus transaction reduction than
“Selective Instruction Compression for Memory Energy Reduction in Embedded Systems” in my environment.
The “Selective Instruction Compression for Memory Energy Reduction in Embedded Systems” is suitable in my environment
¾ Not all programs execute lower than 256 kinds of instructions. That is, not all instructions can be transmitted by 1 Bus transactions.
9 We can increase the table size to store more instructions.
However, this will further increase the table access delay and need more bit for indexing.
¾ In average, there is only 1.65% instructions need more than 1 Bus transaction. I think it is not worthy to double the table size and increase the Bus width by 1.
As the reasons mentioned above, I think that the “Selective Instruction Compression for Memory Energy Reduction in Embedded Systems” is the most suitable compression method which I have surveyed and the Bus transaction ratio is low enough.
Data Stream:
Different data type:
Due to the data types, the sizes of data may be various. In normal Bus, these data are all transmitted on 32-bit Bus even if the data size smaller than 32 bits. On 8-bit narrow Bus environment, the proposed methods can transmit data with small size by fewer Bus transactions without mistake.
The general researches on the data size in time critical environments
focus on two main points
¾ Insignificant Bits (sign extension)
¾ Repeated Bits (relationships between data)
The reason that I think that the data stream needn’t encoding.
¾ The relationship between 2 data
9 Using repeated bit as redundant type is to utilize the relationship between 2 data.
¾ The relationship between more than 2 data
9 It is highly program dependent. It is hard to find a common relationship in all programs.
¾ Some data value will repeatedly occur
9 The probability is very low and hardly to catch the appearance.
The data stream only utilizes the most efficient properties. The other properties are unobvious and hardly to utilize. I think that it is good enough to use just insignificant bits and repeated bits as redundant bits types in data stream.
Q8:
The Bus encoding is a time critical work. How do you convince us that the proposed method won’t affect pipeline? (chapter 4)
A:
I cannot convince that the proposed method won’t affect pipeline. I can only propose the coding (encoding and decoding) delays of coding methods for system designer to estimate.
The coding (encoding and decoding) delays of coding method:
Instruction Address
T0 T0-C DAT(NBDAT) EB (Repeated
Variable stride Variable stride Historical EB (Repeated
(similar to T0) (similar to
Gate delay (encoder) No run time delay No run time delay Gate delay (decoder) 255 entries direct map
cache look up 2-to-1 MUX
256 entries direct map cache look up
2-to-1 MUX
Data (VL-Encoder)
Repeated bits Insignificant bits Insignificant bits Handle Method Copy Embedded hint Additional hint Gate delay
94
資
鄭式勳9317580 鄭式勳9317580 鄭式勳9317580 鄭式勳9317580 鄭式勳9317580