Chapter 1 Introduction
1.2 Method
1.2.2 Second year
To achieve high performance on embedded systems, 3D multi-core architecture has become a promising alternative. Besides, efficiency in energy consumption is also crucial to enable high-performance computing. Mapping and scheduling of many-core utilization has been known as a NP-complete problem, and thus, many heuristics were proposed for energy-aware schedules using various dynamic voltage and frequency scaling (DVFS) techniques including an energy-efficient time-constrained task-scheduling algorithm considering transmission cost for minimizing the total energy consumption.
In this work, the core problem is to find a schedule with the best
energy-efficiency on a 3D-multi-core architecture. Figure 1.4 shows the system overview of the timing-and-resource constrained scheduler. Inputs to a scheduler include a task graph, a timing constraint, a resource constraint and an energy model.
All tasks after scheduling must be assigned into one core with a correct execution order. Moreover, energy minimization is the objective of a scheduler where the energy-saving rate is computed to estimate the energy- efficiency of schedulers.
Figure 1.4: system overview of task scheduler
Wu et al. [41] proposed an energy-efficient task scheduling algorithm on top of [40] via DVFS at the system level and formulated a priority gain function considering both gains and losses for selecting tasks to scale down its frequency.
Built on top of the previous task-scheduling algorithms [41], two dynamic task-to-core mapping strategies, Dynamic Remapping (DR) and Iterative Dynamic Remapping (IDR), are proposed to reduce slack slots and to improve Energy-Saving Rate (ESR). Experimental results show that ESR of the algorithm with the IDR strategy is 16 percent higher than the previous work [41] on average. Moreover, compared to an ILP solution, both two proposed strategies can run at least three-order faster and achieve comparable performance on energy saving. Figure 1.5 shows the flow of the DR or IDR strategy. There are two rounds in the DR strategy where task-to-core mapping and voltage scaling are performed in both round.
Figure 1.5: Design flow of scheduling with dynamic remapping methods
Chapter 2
Fast scan-chain ordering for 3D-IC designs under through-silicon-via (TSV) constraints
2.1 Introduction
Interconnect along with scaling technology plays an important role in deciding circuit performance. Structural Three-Dimensional (3D) integration is emerging as a promising solution to reduce the length of long interconnects across circuits [1].
Moreover, 3D integration provides many other advantages over the traditional Two-Dimensional (2D) implementation, such as better packaging efficiency and higher transistor density. These advantages, collectively, not only provide significant performance improvement but also alleviate the problems caused by long interconnects [2], [3], [4], [5]. Among all vertical-integration techniques, Through-Silicon Via (TSV) provides the best timing and power performance for interconnection. However, TSVs typically incur additional area overhead and may become another source of defects [6]. Therefore, considering yield loss and area cost, the number of TSVs in use is typically limited in a 3D Integrated Circuit (IC) design.
On the other hand, scan-chain design is the most prevailing Design-for-Testability (DFT) technique which aims to reduce the difficulty of testing on the Circuit Under Test (CUT). In order to guarantee high fault coverage on complex designs, the CUT is modified during the synthesis stage to enhance its controllability and observability. All Flip-Flops (FFs) are replaced by multiplexed-input scan FFs with multiple operation modes. During the test mode, i.e., when a signal test is activated, the values of one test pattern are shifted to scan FFs of the scan chain in sequel. Later, the pattern is applied to the combinational logic through the primary inputs under the function mode. The response values are finally captured at the primary outputs and shifted out through the scan chain once again under the test mode. Scan testing reduces the sequential problem into a combinational problem; thus, high coverage can be efficiently achieved.
Although scan FFs enhance the testability on the CUT, the stitching wire of a scan chain can be long and may deteriorate signal integrity or even violate the timing constraint. Therefore, scan-chain ordering, referring to the order decision for scan FFs
based on physical information, is widely studied. Many layout-based techniques [7], [8], [9] have been shown to reduce the scan-stitching wire effectively.
Test power has always been a concern of scan testing. It depends on the characteristics of test patterns as well as shift operations. Higher logic switching activities in the combinational logic usually stem from ATPG patterns and corresponding LFSR without considering the functionality of the circuit. The scan-shift operation also causes the high toggle rate during testing. Generally, different methods reported to solve the power-related problem in the CUT, such as power-aware test pattern generation [10], test-pattern-filling technique [11], scan-chain partitioning [12], and scan-chain ordering [13], [14], [15], [16]. Among all solutions, scan-chain ordering offers several advantages over other techniques, including no negative effects in the test application time and fault coverage, and can be easily combined to the design flow with other power reduction techniques.
Figure 2.1: Comparison between 2D and 3D scan-chain designs
To further study interconnects on 3D IC designs, Yuan et al. [17] showed that the scan-stitching wire length in a multi-layer circuit is shorter compared with that in the planar circuit, as shown in Figure 1. Experimental results in [17] also suggested that the more TSVs in use in the scan chain, the less scan-stitching wire cost. Such observation combined with the TSV induced yield loss indicates an important tradeoff between the scan-stitching wire and the number of TSVs in use. Therefore, a
constraint of TSVs in use must be considered for a 3D-IC design.
Both pre-bond testing and post-bond testing are important for improving the yield of 3D ICs. For enabling pre-bond testability, Lewis et al. [18] proposed a scan-island based design and Kumar et al. [19] proposed a hyper-graph based partitioning for pre-bond 3D IC testing. Additionally, several scan-ordering approaches for 3D IC post-bond testing were accordingly proposed in [17]. VIA3D uses the fewest number of TSVs to alleviate TSV impact on the scan-stitching wire.
MAP3D first maps all scan FFs onto one single layer, followed by the 2D scan-chain reordering technique. OPT3D considers TSV impact during cost computation for scan-stitching wire. OPT3D outperforms the other two in terms of total wire cost.
However, scan-induced power dissipation is not considered by such work and is also an important issue for 3D ICs. A Genetic Algorithm (GA) method was then proposed in [17] where the runtime issue remains unresolved and solution quality is unstable.
Hence, a fast 3D scan-chain design is presented in this work to simultaneously consider wire and power costs.
In this work, TSV-constrained scan-chain ordering is first analyzed and formulated into a Traveling Salesman Problem (TSP). Later, a fast algorithm is developed to minimize the scan-stitching wire and/or scan-induced power dissipation, to simultaneously satisfy the constraint on the number of TSVs in use for 3D-IC designs. Our algorithm consists of two phases: First, we construct an initial simple path through all scan FFs using a modified greedy algorithm, the multiple fragment heuristic, via a dynamic closest pair data structure FastPair. Second, we propose two new techniques, 3D planarization and 3D relaxation, to minimize the wire/power cost and to reduce the TSV number, respectively. Experiments show the practicality of our algorithm by producing comparable scan-stitching wire length (and total power dissipation) to the GA method with a two-order speedup on average.
As a result, the contributions of this work can be summarized as:
• Formulate scan-chain ordering considering TSV constraints into a modified TSP problem.
• Propose a greedy algorithm for scan-chain ordering of 3D-IC designs to simultaneously minimize wire and power costs.
• Demonstrate that the proposed algorithm can be practically used while supporting multiple scan chains.
The rest of this work is organized as follows: In Section 2.2, we present problem formulations of TSV-constrained scan-chain ordering for 3D-IC designs, with three different objectives:
• Wire-cost minimization
• Power-cost minimization
• Wire-and-power cost minimization
In Section 2.3, a multiple fragment heuristic with the support of FastPair is implemented to obtain good initial solution. The process of 3D planarization to minimize scan-stitching wire cost (or scan-induced power dissipation), and the 3D relaxation process to reduce TSV numbers are detailed, respectively. Section 4.1 presents the experimental results, which include a comparison between our algorithm and a GA method under TSV constraints in terms of numerous performance metrics and runtime over a variety of benchmark circuits. Finally, in Section 5 we draw our conclusion and outline future work.
2.2 Problem formulation of scan-chain ordering for TSV-constrained 3D-IC designs
In this section, we formulate the scan-chain ordering problem for 3D-IC designs with three different objectives: (1) to minimize the scan-stitching wire cost to avoid routing congestion and timing violation; (2) to reduce the scan-induced power dissipation on testing to avoid damage and reliability degradation to the CUT; and (3) to simultaneously consider wire and power costs. First we briefly describe the traditional scan-ordering problem for wire minimization and we define a new model for TSV-constrained 3D-IC designs. We then provide a literature review of the power issue for scan reordering and define a new problem for 3D power-optimized scan ordering. Finally, the problem is formulated by simultaneously considering the wire and power costs.
2.2.1 Wire-cost minimization problem
The traditional problem of planar (2D) scan-chain ordering to minimize scan-stitching wire cost can be formulated into:
Input: CUT C with n scan FFs {c0, c1,…, cn−1} and their locations {(x0, y0), (x1, y1), . . . , (xn−1, yn−1)}
Output: Scan-FF ordering is formed as〈cπ(0), cπ(1), . . . , cπ(n−1)〉such that the total cost of scan-stitching wire is minimized.
∑
𝑛−1𝑖=1|𝑥
𝜋(𝑖)− 𝑥
𝜋(𝑖−1)| + |𝑦
𝜋(𝑖)− 𝑦
𝜋(𝑖−1)|
(1)In Equation (1), xπ(i) and yπ(i) denote the x and y coordinates of the ith scan FF in the scan-FF ordering, respectively. All scan FFs are placed on the same plane and the cost of scan-stitching wire is defined as the sum of the Manhattan distances between two consecutive FFs, ci and ci+1, in this formulation. However, since FFs can be located across different layers for 3D-IC designs, the TSV cost for connecting two cross-layer FFs needs to be considered and the layer information of FFs needs to be included.
{(x
0, y
0, L
0), (x
1, y
1, L
1),…, (x
n−1, y
n−1,L
n−1)}
The total cost of scan-stitching wire is modified as follows:
∑
𝑛−1𝑖=1|𝑥
𝜋(𝑖)− 𝑥
𝜋(𝑖−1)| + |𝑦
𝜋(𝑖)− 𝑦
𝜋(𝑖−1)| + 𝐶
𝑇𝑇𝑇× |𝐿
𝜋(𝑖)− 𝐿
𝜋(𝑖−1)|
(2)In Equation (2), CTSV denotes the equivalent scan-stitching wire cost for one TSV connecting two consecutive layers. Generally, CTSV can be defined as the height of one TSV. Moreover, considering manufacturability and yield loss, the total number of TSVs in use becomes a constraint to this problem and can be expressed as
𝑁
𝑇𝑇𝑇= ∑
𝑛−1𝑖=1|𝐿
𝜋(𝑖)− 𝐿
𝜋(𝑖−1)|
(3)According to the modified formulation for the TSV constrained scan-chain ordering problem, two approaches are proposed in [17]. One approach is developed on the basis of Genetic Algorithm (GA), and the other is based on Integer Linear Programming (ILP). Although the GA approach may possibly find the near-optimal solution, the quality of one identified solution cannot be guaranteed. Moreover, the ILP approach, which will find the optimal cost, may not be able to produce a feasible solution within a limited time. The experimental result in [17] shows a lower-bound value on the total scan-stitching wire cost, which was obtained quickly through the ILP approach without providing a detailed ordering of scan FFs.
Figure 2.2: Flow of proposed scan reordering algorithm
From a practical perspective, a fast algorithm needs to be developed that will overcome the runtime issue. Therefore, we propose a fast two-stage algorithm. In stage 1, we convert the 3D scan-chain ordering problem into a TSP problem. Then, a tour-construction heuristic [20] with the support of a particular closest-pair data structure, FastPair, [21] is used to stitch a simple path as an initial solution. During stage 2, local refinement by 3D planarization and constraint-solving by 3D relaxation minimize the total cost and reduce the number of TSVs in use, respectively. Figure 2.2 shows the overall flow. Additional details are given in Section 2.3.
2.2.2 Power-cost minimization problem
In the second problem, the goal of scan-chain ordering is to find an ordering of scan FFs with minimal power dissipation originating from scan-shift operations.
Integrating scan-chain ordering techniques into the current design flow (while maintaining the original fault coverage and test application time) is straight-forward.
The only challenge is that the power-optimized scan-chain ordering depends on a fixed set of test patterns generated by Automatic Test Pattern Generation (ATPG).
Therefore, in this section we briefly introduce the background of power consumption induced by scan testing and then formulate this problem for TSV-constrained 3D-IC
designs.
1) Estimation of Power Dissipation: Previous power-optimized ordering techniques focus on both the total power and the peak power consumption. The total power consumption is the sum of power consumed during testing and the peak power consumption is the highest power consumption used among all test patterns.
Therefore, the dynamic power consumption can be expressed as:
P = 0.5・𝐶
𝑙𝑑・ 𝑉
𝑑𝑑2・ F・S
(4)where P is the dynamic power consumption, Cld is the load capacitor, Vdd is the supply voltage, S is the switching activity, and F is the clock frequency, respectively.
According to Equation (4), the power consumption during scan-shift operations is highly correlated with the switching activities in the CUT. In practice, it is time-consuming to count the exact number of all switching activities in the CUT, but the number of scan-chain transitions and the triggered transitions of logic elements in CUT are proven highly correlated in [11]. In other words, the number of transitions in the scan chain is a good estimation for total switching activities in the CUT.
Total switching activities in the CUT during scan-shift operations depend on the transitions in the scan chain and the corresponding positions. Thus, the number of Weighted Transitions (WT) can be defined as follows,
WT = �(𝑠𝑖𝑧𝑒 − 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛)
where WT represents the real switching activities in the CUT, size is the total number of scan FFs, and position is indexed from the different beginning locations between the input vector and output response. Hence, every transition in the input vector or the output response has its own weight to reflect the real condition. Defined below are several necessary notations used in the weight transitions throughout the remainder of the paper:
{c0, c1,…, cn−1}: n scan FFs in the CUT C.
O =〈cπ(0), cπ(1), . . . , cπ(n−1)〉: Scan-chain ordering with n scan FFs.
V = {v0, v1,… , vn−1}: n-bit input pattern where vi is scanned in the scan FF ci during scan testing. Therefore,〈vπ(0), vπ(1), . . . , vπ(n−1)〉 represents an
input pattern with respect to a given scan chain ordering.
R = {r0, r1,… , rn−1}: n-bit output response where ri is scanned out from the scan FF ci during scan testing. Therefore,〈rπ(0), rπ(1), . . . , rπ(n−1)〉 represents an output response with respect to a given scan chain ordering.
Given the notations, the weighted transitions of an input, vector V and an output response R can be defined, respectively:
VWT(V) = ∑
𝑛−1𝑖=1𝑖・(𝑣
𝜋(𝑖)⊕ 𝑣
𝜋(𝑖−1))
(5)RWT(R) = ∑
𝑛−1𝑖=1(𝑛 − 𝑖)・(𝑟
𝜋(𝑖)⊕ 𝑟
𝜋(𝑖−1))
(6)where VWT(V ) and RWT(R) are denoted as the weighted transitions for the input vector V and the output response R; the exclusive-or ⊕ operator checks the difference between two adjacent bits. i and (n−i) represent different weighting rules for scan-in and scan-out operations respectively. Generally, Equations (5) and (6) can be easily extended into the following equations form test patterns:
VWT(𝑉
1, 𝑉
2, … , 𝑉
𝑚) = ∑
𝑚𝑗=1∑
𝑛−1𝑖=1𝑖・(𝑣
𝜋(𝑖)𝑗⊕ 𝑣
𝜋(𝑖−1)𝑗)
(7)RWT(𝑅
1, 𝑅
2, … , 𝑅
𝑚) = ∑
𝑚𝑗=1∑
𝑛−1𝑖=1(𝑛 − 𝑖)・(𝑟
𝜋(𝑖)𝑗⊕ 𝑟
𝜋(𝑖−1)𝑗)
(8)V j and Rj are the jth input vector and the jth output response in the set of m test patterns, respectively, and the 𝑣𝜋(𝑖)𝑗 (𝑟𝜋(𝑖−1)𝑗 ) is the bit being scanned in the ith scan FF of the chain ordering, located at the jth input vector (jth output response).
In addition to scan-in and scan-out transitions, peak transitions are also taken into account to determine the total weighted transitions. A peak transition occurs when there is a difference between the last-out bit of the jth output response and the first-in bit of the (j + 1)th input vector. Since a peak transition causes all scan FFs to toggle, the weight of the peak transitions is the length of the scan chain. The weighted peak transition is denoted by PWT defined as:
PWT = ∑
𝑚−1𝑗=1𝑛・(𝑟
𝜋(𝑛−1)𝑗⊕ 𝑣
𝜋(0)𝑗+1)
(9)Figure 2.3: Calculations for weighted transitions
Consequently, the total weighted transition TWT can be viewed as TWT = VWT + RWT + PWT.
Figure 2.3 shows two examples of calculated total weighted transitions. The CUT with four scan FFs uses two scan-chain ordering, and three test patterns are applied during scan testing. Hence, the total transitions, the transitions for input vectors, the transitions for output responses, the peak transitions and the corresponding weights in different positions are shown in Figure 2.3. In Figure 2.3(a), the scan chain has an initial ordering (1, 2, 3, 4). Thus, VWT({V1, V2, V3}) = 1・1+1
・2+3・3 = 12, RWT({R1, R2, R3}) = 2 ・ 3 + 1 ・ 2 + 1 ・ 1 = 9, and PWT = 2
・ 4 = 8. The total weighted transitions TWT is 12 + 9 + 8 = 29. However, Figure 3(b) shows a power-optimized ordering (2, 3, 4, 1) by scanning in the same test patterns.
Thus, VWT({V1, V2, V3}) = 1 ・ 1 + 3 ・ 2 + 1 ・ 3 = 10, RWT({R1, R2, R3}) = 1
・ 3 + 1 ・ 2 + 2 ・ 1 = 7, and PWT = 0 ・ 4 = 0. The total weighted transitions TWT is 11 + 7 + 0 = 18. Therefore, the total power reduction rate is 38 percent and the number of peak transitions is reduced from 2 to 0.
2) Formulation for TSV-constrained 3D-IC Designs: The problem of scan-chain ordering to minimize the scan-shift power dissipation can be formulated into:
Input: CUT C with n scan cells {c0, c1, . . . , cn−1}, their layer information {L0,L1, . . . ,Ln−1}, and a fixed set of m test patterns {V1, R1, V2, R2, . . . , Vm, Rm}.
Output: Scan-cell ordering is formed as follows〈cπ(0), cπ(1), . . . , cπ(n−1)〉such that the total weighted transitions TWT({V1, R1, V2, R2, . . . , Vm, Rm }) is minimized under a TSV constraint
Compared with the scan-wire minimization problem, we are only concerned with the layer information of the scan FFs since the problem is not related to their geometric locations or the objective function. Therefore, we only need to consider total TSV cost by using Equation (3).
Regarding the formulation for the power-minimization concerning TSV-based 3D-IC designs, Giri et al. from [22] also used a GA approach to solve this problem.
However, it is time-consuming and unstable, which can impair quality solutions.
Therefore, we propose a similar flow, as illustrated in Figure 2.2, to solve this power-optimization problem. At the beginning, we establish a look-up table storing the pair-wise cost to avoid the high complexity of calculations. Since the objective involves the transition positions in the scan chain, there are several modifications in
the proposed algorithm. Further details are provided in Section 2.3.
2.2.3 Wire-and-power cost minimization problem
Two previous 3D-IC scan-chain ordering problem (with different objectives) are reviewed. One is to minimize the total cost of scan-stitching wire cost; the other is to minimize the scan-induced power cost during testing. In a more advanced case, we would like to simultaneously consider wire and power costs. Cost function in this new problem is combined from the wire and power cost function.
The problem of scan-chain ordering to minimize the power and wire cost simultaneously can be formulated into:
Input: CUT C with n scan cells {c0, c1, . . . , cn−1}, their layer information {L0,L1, . . . ,Ln−1}, and a fixed set of m test patterns {V1, R1, V2, R2, . . . , Vm, Rm}.
Output: Scan-cell ordering is formed as follows〈cπ(0), cπ(1), . . . , cπ(n−1)〉 such that the combined cost ((1 − α) × wire cost + α × power cost) is minimized under a TSV constraint.
The same flow illustrated in Figure 2.2 is used again to solve the combined-cost optimization problem. Experimental results in Section IV will also show that the proposed algorithm can efficiently minimize the combined cost when ordering scan FFs.
2.3 A fast scan-chain ordering
In this section, the proposed algorithm is elaborated with respect to different objectives, including wire-cost minimization in Section 2.3-A, power-cost minimization in Section 2.3-B, and wire-and-power (combined) cost minimization in Section 2.3-C, respectively.
2.3.1 Minimizing wire cost
2.3.1 Minimizing wire cost