Module Selection Issue - 具有模組選擇能力之延遲最佳化數位微流體生物晶片合成技術

Chapter 4 Motivations

4.2 Module Selection Issue

Most of previous works bind operations with the fastest (usually also the largest) module in synthesis. It helps to simplify the problem and is useful if the array size is large enough. However, in most cases, especially as segregation area is wrapped around, this strategy may be ineffective, and modules with large size may make the array overcrowded. For example, five mixing operations are scheduled with two different binding in Figure 13 using the modules in Table 1. The left one, bind all operations to the fastest module, mix24, requiring 11 cycles to finish. In contrast, the right one, bind to different module, only needs 9 cycles instead. This example shows that binding operations with the fastest module is not always a good policy. To explore more possibility and to enhance utilization of array area, integrate module selection process into synthesis flow to select the suitable modules dynamically is necessary.

Figure 13. Comparison between non-module selection and module selection

Chapter 5 Proposed Algorithm

5.1 Overview

The problem formulation of this work is as following: given three information, 1) DMFB architecture, which consists of chip size, the count of dispensing ports, number of detecting cells or other functional devices, 2) a sequencing graph G, which describe the biochemical application, and 3) a resource library L, which contains all feasible module and information of other devices, to determine the binding and the scheduling results. These results are without dependency and resource violation, and the latency of the biochemical assay is minimized. The overall flow of the proposed algorithm is shown in Figure 14.

Sequencing graph Resource library

Initial binding & scheduling

Iterative operation rebinding

Scheduling & binding results

Architecture

Figure 14. The overview of proposed algorithm

In the beginning of the process, bind all operations to the fastest module (all the operations are bound). Then, a scheduler, storage minimization scheduling, based on list scheduling is performed to obtain an initial scheduling result. Following, the initial solution will be sent to the iterative operation rebinding process to refine the solution iteratively. More details of the storage minimization scheduling and iteratively operation rebinding processes will be described in section 5.2 and section 5.3.

5.2 Storage Minimization Scheduling

Storage minimization scheduling (SMS) is derived from the well-known list scheduling algorithm. At first, all ready operations are put in the ready list (Lready).

There are four kinds of ready operations as following: 1) a dispensing operation, 2) a mixing operation whose parents are dispensing operation, 3) a mixing operation whose parents are a dispensing operation and a finished mixing operation and 4) a mixing operation whose parents are two finished mixing operation. Then, schedule operations in ready list (L_ready) one by one in each cycle according to their priority.

Compared to conventional list-based scheduling in DMFB, we schedule dispensing operations more flexible and propose a new priority. Previous works, like [12], [15]

and [18], discard dispensing operations from the ready list, Lready, since these operations can be scheduled at any cycle if there are enough resource ports. The dispensing operations and their successor operations are scheduled sequentially to avoid st_d. This scheduling rule guarantees no st_d. However, it may lead long latency due to inflexibility in scheduling dispensing operations, as describe in section 4.2. We modified the rule by scheduling dispensing operations before their successors but not necessary right before them. Figure 15 is an example of the scheduling rule. If an operation v can be scheduled at cycle t (t=3), check previous cycles from t-1 to 1 whether there are available reservoirs and enough area to save the resultant of reservoirs. The process starts form t-1, since we want to reduce the usage of st_d as more as possible. If there are available reservoirs at cycle 1 and enough area to save resultant at cycle 2, schedule dispensing operations of operation v at cycle 1.

Otherwise, schedule the other operation.

In section 4.1, we realize that PS has a bad result with the complex graph since it entirely minimizes st_m but ignores the critical path issue. Therefore, we proposed a priority as following:

(2)

The proposed priority considers the critical path and st_m minimization at the same time. Since sf(v) implies the reduction of stm, we use it to indicate stm minimization.

However, since sf(v) is quite smaller than each path length, we multiply sf(v) and a constant  to balance its value. This constant is a significant large number which is usually set a half of the critical path length. An example in Figure 16(a) will be demonstrated to show the importance of stm minimization. At first, all mixing operations which are presented by circular shape are bound to the fastest module whose area is 8 cells and duration is 2. According to the priority which only considers the critical path (e.g., priority of operation m = 4 and operation r = 2), assign the scheduling order to each mixing operation like the red number in both Figure 16(a) and Figure 16(b).

Exist

Schedule the other operation

Figure 15. An example of proposed scheduling rule

Figure 16. (a) An sequencing graph of an assay

(b) Scheduling result using the priority only consider critical path

If operations with the same priority, set their scheduling order randomly. The scheduling result of the assay is shown in Figure 16(b) and the latency is 15. As we can see, this priority will lead to a BFS scheduling order and lots of stm. On the other hand, if we schedule the assay in Figure 16(a) using the proposed priority, the scheduling order will changed to the red numbers in Figure 17(a). At first, operation m and operation n will be scheduled first like using the priority which only consider critical path. However, the scheduler using the proposed priority will select the operation which can save more storage units like operation r and operation s next. As a result, the proposed priority will lead a DFS scheduling result as shown in Figure 17(b) and potentially reduce the amount of stm. Therefore, the latency reduces from 15 to 13 when schedule operations using the proposed priority. Figure 18 shows the overall flow of SMS.

(a) (b)

Figure 17. (a) A changed scheduling order (b) Scheduling result using proposed priority

Figure 18. Overall flow of SMS

5.3 Iterative Operation Rebinding

As mentioned above, the scheduling algorithm for SMS potentially minimizes the latency by reducing the amount of storage units. However, further improvement may still be achieved by rebinding which explores more possibility and to enhance utilization of array area as mentioned in section 4.2. The overall algorithm of iterative operation rebinding is depicted in Figure 19. At first, operation rebinding is performed to determine the new binding of all operations for latency minimization. If the latency for the new binding result is shorter than the previous one, the counter k is reset to M;

if not, set k=k1. The process does not terminate until the counter k equals to zero.

Figure 19. Flow of iterative operation rebinding

5.3.1 Operation Rebinding

The objective in operation rebinding is to find the new binding result to reduce latency. The simplest way to achieve the objective is listing all possible binding results and selecting the one with the most latency improvement. However, it is not realistic. Since there are too many binding results to be checked, it will cost too much execution time. Our strategy is rebinding an operation once at a time until all operations are rebound. Here, a rebinding process in an iteration means changing the module of an operation v to another module m. (v, m) is called binding pair (BP). Take Figure 20 (a) for example. There are 3 unlocked mixing operation c, f, and g and each mixing operation can be bound to 3 possible modules according to Table 1. Due to the above conditions, there are 9 feasible BPs can be selected to rebind as shown in Figure 20(b). Therefore, operation rebinding can be regard as finding the BP with the highest latency improvement in each iteration.

(a) (b)

Figure 20. (a) A graph with 3 mixing operation c, f, and g without rebinding (b) All BPs of operations without rebinding

Figure 21 shows the operation rebinding flow. The first step in the flow is gain calculation. It calculates latency gains for all possible BPs with unlocked operations (i.e., operations are not rebound) to judge the latency improvement of each BP. There are two latency gains: the primary latency gain (Gp) and the secondary latency gain (G_s). They represent the latency improvement. The physical meaning of them will be described in section 5.3.2. After gain calculation, BP selection will be performed. The BP with the highest G_p will be selected first. Since the unchanging binding is also a kind of rebinding processes (i.e. existing a BP with zero Gp), a BP with negative Gp

will not be considered here. If there is more than one BP with the highest G_p, the BP with the highest Gs is selected. According to the selected BP (v, m), operation v will be rebound to module m, and m will be locked after. Finally, rescheduling is performed by SMS to get the current latency. These steps mentioned above does not terminate until all operations are locked.

Gain calculation

Rescheduling BP selection

All operations are locked All unlocked operations

Rebind & Lock

False True

Figure 21. Operation rebinding flow

5.3.2 Latency Gains

There are two latency gains, primary latency gain (G_p) and secondary latency gain (Gs) to determine the latency improvement of each BP. The first one Gp can be represented as the following equation:

(3)

In (3), T is the latency before rebinding, and T' means the latency while the operation v bound with the module m, and SMS is performed. The second one Gs is presented by the following equations:

(4)

Equation (4) is composed of two parts, local latency improvement and storage unit reduction. Since Gp only represents the improvement of the critical path, some BPs with the potential to make a shorter latency in the next iteration may be ignored.

Therefore, G_s indicates this potential by the sum of end cycle improvements, t_{i }t_i', for all operations. However, since operations locate on different paths, the importance of each end cycle improvement is not the same. To indicate differences of them, the weight  in (5)(6) multiplies each end cycle improvement together.

(5) (6)

(5) means the ratio between the longest path passes through an operation and the critical path length. Due to the fact that the number of storage units affects the total latency, storage unit reduction, nst  nst', should also be considered. nst means the average storage units count in each cycle before rebinding, as shown in (7). nst' implies the average storage units in each cycle when operation v is bound to module m. Besides, we consider that storage unit reduction is contributed by all operations.

Therefore, the number of operations, N, multiplies the storage unit reduction together.

(7)

Chapter 6 Experimental Results

The proposed algorithm has been implemented in C++ on a Linux machine. All experiments are conducted on workstation with an Intel Xeon 2.4GHz CPU with 72GB RAM. The ILP solver we used is Gurobi optimizer 5.0 [29]. Three real-life test cases: multiplexed in-vitro diagnostics, PCR, and Protein assay [13] and six random cases of sample preparation are used here to evaluate our algorithm. Since previous works are implemented with different area constraints and resource libraries, the experimental result they proposed can't compare with each other. Therefore, we want to re-implement them to compare with LOSMOS. Besides, the other two versions of M-LS, M-LS (DEC) and M-LS (INC) are also implemented. They are proposed in [18]

to compare with PS. Both of them force dispensing operation scheduling right before their successor, but M-LS (DEC) uses the increasing order of priority and M-LS (INC) uses the inversed one. However, we cannot re-implement GA and HGA entirely and these two works only report the experimental results using multiplexed in-vitro diagnostics. LOSMOS will compare with the methods mentioned above using multiplexed in-vitro diagnostics with in-vitro resource library [13] in the first experiment. Table 2 shows the experimental result of the first experiment. Since the execution time of all methods is less than 5 sec, we do not report the table of execution time. As we can see, LOSMOS is 1.07 times faster than previous works on average. However, the improvement in LOSMOS is quite small. The reason we thought is that the number of operations in above cases are too small (16~64 operations) thus those solutions are all near optimal. Therefore, we will use cases with larger number of operations to perform in the second experiment. Due to lack of cases

and related resource library, we randomly generate six cases of the multiple-targets sample preparation reaction flow and use the in-vitro resource library. However, there are many set of modules and detectors can be performed. For fairly, we choose the set of module and the detector with the longest latency. The results of the second experiment are shown in Table 3. In this experiment, we find that SMS is 1.21 times faster than most previous works on average except ILP. It shows that scheduling method considering storage minimization has a lot of benefits for latency minimization. In additional to SMS, we further minimize latency using module selection ability in LOSMOS. As the last column in Table 3, LOSMOS performs 1.35 times faster on average than previous works and even 1.04 times faster than ILP with no module selection ability in short execution time (2~6 sec). Therefore, these results prove that LOSMOS can achieve good performance in large cases with little run time.

Finally, we use two real cases: Protein and PCR. To evaluate our synthesis algorithm can apply in real-life in the third experiment. The resource library using here is proposed in [30]. Table 4 shows this experimental result. Since PCR only has seven operations, each algorithm achieves optimal solution. In contrast, protein is a large case. The latency of LOSMOS is equal or better than previous works.

Table 2. Experiment 1  multiplexed in-vitro diagnostics

Area

Table 3. Experiment 2  sample preparations

Table 4. Experiment 3  two real cases: PCR and Protein

Area

Sample_preparation_61 100 169* 306 281 241 247 165

Sample_preparation_65 100 131 219 157 147 145 129

Sample_preparation_67 100 133 221 205 161 155 133

Sample_preparation_70 100 139* 191 177 179 153 129

Sample_preparation_78 100 -* 370 295 263 233 195

Sample_preparation_84 100 175* 291 223 - 203 163

Avg. 1.04 1.73 1.45 1.31 1.24 1

“-” The method fails in that case

“*” ILP does not terminate in 24-hours; the current best result is reported

Area

Sample_preparation_61 100 >24hr <1s <1s <1s <1s 3.2s Sample_preparation_65 100 0.7hr <1s <1s <1s <1s 2.9s Sample_preparation_67 100 3.1hr <1s <1s <1s <1s 3.1s Sample_preparation_70 100 >24hr <1s <1s <1s <1s 3.3s Sample_preparation_78 100 >24hr <1s <1s <1s <1s 6.1s Sample_preparation_84 100 >24hr <1s <1s <1s <1s 5.4s

Area

Protein_103 100 179 267 179 215 185 179

Area

PCR_7 100 <1s <1s <1s <1s <1s <1s Protein_103 100 4.3hr <1s <1s <1s <1s 10.07s

Chapter 7 Conclusion

In this thesis, we proposed the latency-optimization synthesis with module selection (LOSMOS) on DMFBs. LOSMOS consists of two major parts: the storage minimization scheduling (SMS) and the iterative rebinding procedure. Because the storage count is highly related to the assay latency, the first part, SMS, takes the saving factor into consideration for storage minimization and achieves better results accordingly. The second part further iteratively evaluates and improves the binding of each operation with latency gains. According to the ability of module selection, LOSMOS outperforms a state-of-the-art method, Path-scheduler, by 18.22% in terms of latency reduction on average, and even performs better than the optimal ILP method without module selection. Undoubtedly, the good performance and short computation time make LOSMOS a promising option in DMFB synthesis.

References

[1] International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 2011

[2] T.-Y. Ho, J. Zeng, and K. Chakrabarty, “Digital microfluidic biochip: a vision for functional diversity and more than Moore,” in Proc. IEEE/ACM International Conference on Computer-Aided Design, 2010, pp. 578-585

[3] M. G. Pollack, A. D. Shenderov and R. B. Fair, “Electrowetting-based actuation of droplets for integrated microfluidics,” Lab. Chip, vol. 2, no.2, pp. 96-101, Feb.

2002

[4] V. Srinivasan, V. K. Pamula, M. G. Pollack, and R. B Fair, “Clinical diagnostics on human whole blood, plama, serum, urine, saliva, sweat, and tears on a digital microfluidic platform,” in Proc. Micro Total Analysis Systems, 2003, pp.

1287-1290

[5] K. Chakrabarty, “Design automation and test solutions for digital microfluidic biochips,” IEEE Transactions on Circuits and Systems I, vol. 57, no. 1, pp. 4–17, Jan. 2010.

[6] R. Sista, Z. Hua, P. Thwar, A. Sudarsan, V. Srinivasan, A. Eckhardt, M. Pollack, and V. Pamula, “Development of a digital microfluidic platform for point of care testing,” Lab. Chip, vol. 8, no. 12, pp. 2091–2104, Dec. 2008.

[7] T.-Y. Ho, K. Chakrabarty, and P. Pop, “Digital microfluidic biochips: recent research and emerging challenges,” in Proc. IEEE/ACM/IFIP Hardware/Software Codesign and System Synthesis, 2011, pp. 335–343.

[8] S. Roy, B. B. Bhattacharya, and K. Chakrabarty, “Optimization of dilution and mixing of biochemical samples using digital microfluidic biochips,” IEEE Transactions on Computer-Aided Design , vol. 29, no. 11, pp. 1696–1708, Nov.

2010.

[9] Y.-L. Hsieh, T.-Y. Ho, and K. Chakrabarty, “On-chip biochemical sample preparation using digital microfluidics,” in Proc. IEEE Biomedical Circuits and Systems Conference, 2011, pp. 297–300.

[10] J. Ding, K. Chakrabarty, and R. B. Fair, “Scheduling of microfluidic operations for reconfigurable two-dimensional electrowetting arrays,” IEEE Transactions on Computer-Aided Design, vol. 20, no. 12, pp. 1463–1468, Dec. 2001.

[11] F. Su and K. Chakrabarty, “Architectural-level synthesis of digital microfluidics-based biochips,” in Proc. IEEE/ACM International Conference on Computer-Aided Design, 2004, pp. 223–228.

[12] A. J. Ricketts, K. Irick, N. Vijaykrishnan, M. J. Irwin, “Priority scheduling in digital microfluidics-based biochips,” in Proc. IEEE/ACM Design, Automation

& Test in Europe, 2006, pp. 329–334.

[13] F. Su and K. Chakrabarty, “High-level synthesis of digital microfluidic biochips,”

ACM Journal on Emerging Technologies in Computing Systems, vol. 3, no. 4, pp.

16:1–16:32, Jan. 2008.

[14] F. Su and K. Chakrabarty, “Unified high-level synthesis and module placement for defect-tolerant microfluidic biochips,” in Proc. IEEE/ACM Design Automation Conference, 2005, pp. 825–830.

[15] E. Maftei, P. Pop, and J. Madsen, “Tabu search-based synthesis of dynamically reconfigurable digital microfluidic biochips.” in Proc. Compilers Architecture and Synthesis for Embedded Systems, 2009, pp. 195–203.

[16] M. Alistar, E. Maftei, P. Pop, and J. Madsen, “Synthesis of biochemical applications on digital microfluidic biochips with operation variability,” in Proc.

Design, Test, Integration and Packaging conferences Symp., 2010, pp. 350–357.

[17] D. Grissom and P. Brisk, “Path scheduling on digital microfluidic biochips,” in Proc. IEEE/ACM Design Automation Conference, 2012, pp. 26–35.

[18] L. Luo and S. Akella, “Optimal scheduling of biochemical analyses on digital microfluidic systems,” IEEE Transactions on Automation Science and Engineering, vol. 8, no. 1, pp. 216–227, Jan. 2011.

[19] F. Su and K. Chakrabarty, “Module placement for fault-tolerant microfluidics-based biochips,” ACM Transactions on Design Automation of Electronic Systems, vol. 11, no. 3, pp. 682–710, Jul. 2006.

[20] P.-H. Yuh, C.-L. Yang, and Y.-W. Chang, “Placement of defect-tolerant digital microfluidic biochips using the T-tree formulation”, ACM Journal on Emerging Technologies in Computing Systems, vol. 3, no. 3, pp. 13:1–13:32, Nov. 2007.

[21] Z. Xiao and E. F. Y. Young, “Placement and routing for cross-referencing digital microfluidic biochips,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 7, pp. 1000–1010, Jul. 2011.

[22] F. Su, W. Hwang, and K. Chakrabarty, “Droplet routing in the synthesis of digital microfluidic biochips,” in Proc. IEEE/ACM Design, Automation & Test in Europe, 2006, pp. 323–328.

[23] M. Cho and D. Z. Pan, “A high-performance droplet routing algorithm for digital microfluidic biochips,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 1714–1724, Oct. 2008.

[24] P.-H. Yuh, C.-L. Yang, and Y.-W. Chang, “BioRoute: A network flow based routing algorithm for the synthesis of digital microfluidic biochips,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.

27, no. 11, pp. 1928–1941, Nov. 2008.

[25] T.-W. Huang, C.-H. Lin, and T.-Y. Ho, “A contamination aware droplet routing algorithm for the synthesis of digital microfluidic biochips,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 11, pp. 1682–1695, Nov. 2010.

[26] C. C.-Y. Lin and Y.-W. Chang, “ILP-based pin-count aware design methodology for microfluidic biochips,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 9, pp. 1315–1327, Sep. 2010.

[27] T. Xu, K. Chakrabarty, and V. K. Pamula, “Defect-tolerant design and optimization of a digital microfluidic biochip for protein crystallization,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.

29, no. 4, pp. 552–565, Apr. 2010.

[28] Y. Zhao, K. Chakrabarty, R. Sturmer, and V. K. Pamula, “Optimization Techniques for the Synchronization of Concurrent Fluidic Operations in Pin-Constrained Digital Microfluidic Biochips,” IEEE Transactions on Very

在文檔中具有模組選擇能力之延遲最佳化數位微流體生物晶片合成技術 (頁 25-0)