With Module Selection - Previous Works - 具有模組選擇能力之延遲最佳化數位微流體生物晶片合成技術

Chapter 3 Previous Works

3.2 With Module Selection

The methods mentioned above do not consider the module selection and the performance is thus limited. The extended GA [14] and the Tabu-search based algorithm (Tabu) [15] are existing algorithms which consider the module selection.

The extended GA inherits from [10] and adds binding information into the chromosome to achieve module selection. The latter method  Tabu starts with a solution in which operations are bound to modules at random. It then use Tabu searching to find the best binding result and use list scheduling to schedule operations.

Even if these two methods, the extended GA and the Tabu, permit module selection.

However, they are all stochastic optimization method and are difficult to apply in online scheduling. As a result, a fast algorithm equipped with module selection ability is indispensable.

Chapter 4 Motivations

4.1 Effects of Storage Units

A storage unit exists if two dependent operations are not sequentially scheduled.

Since the storage units occupy chip area, the number of storage units located on a chip is inversely proportional to the amount of mixing space. Therefore, how to minimize the storage units is a crucial issue for latency minimization. For elaborating the effect of storage units, we further classify a storage unit as a dispensing storage unit (st_d) or an intermediate storage unit (stm) in this work. The first one, std, appears between a dispensing operation and a reconfigurable operation. Previous works [12], [15], and [18] schedule the dispensing operation and the reconfigurable operation sequentially to ensure no st_d. However, this inflexible scheduling rule may lead to the worse latency. Take Figure 10 as an example. At first, we will schedule the assay like Figure 10(a) by the list scheduling and the scheduling rule mentioned above on a biochip consists of a 2-D array with 16 cells and two ports, S1 and R1. The scheduling result is shown in Figure 10(b). As we can see in Figure 10(c), if we schedule operation e earlier and permit one cycle delay between operation e and operation g, the latency can be reduced from 8 to 7. Accordingly, a more flexibility scheduling rule may be required.

The other one, st_m, is a more difficult problem than st_d in scheduling. Path-scheduler tries to reduce stm by scheduling an entire path, a sequence of dependent operations, instead of operations and thus has shorter latency than M-LS in high fan-out cases (e.g., protein assay). However, with high fan-in cases or even more complex reactions, it may lead to worse efficient. Figure 11. (a)Figure 11(a) shows a complex reaction.

As shown in Figure 11(b) and Figure 11(c), the result which is produced by Path-scheduler has more st_m and has longer latency than the result which is produced by M-LS.

(a) (b)

(b)

(a) (c)

Figure 10. An example of the inflexible scheduling rule

Figure 11. (a) A complex reaction (b) PS and (c) M-LS

(c)

Therefore, we try to minimize st_m in the other way. We observed that the number of storage units is highly related with operations’ in-/out-degree. In fact, the in-degree of an operation v, which is denoted as , implies how many droplets it required, and the out-degree of v, , means the number of resultant it produces.

As a result, to schedule a vertex with high in-degree may help to minimize storage units; on the other hand, to schedule a vertex with high out-degree implies that more storage units may be produced, and vice versa. However, since dispensing operations can be scheduled at any cycle, we only consider the in-degree contributed from mixing operations, . Therefore, the difference between and is called storage saving factor sf(v), as the following equation:

(1)

sf(v) can be regard as the possible usage amount of stm if an operation v is scheduled.

As an example in Figure 12, scheduling operation e (the most right graph) will save the most number of storage units, and scheduling operation a (the most left graph) saves the least.

Figure 12. All possible value of storage saving factor

4.2 Module Selection Issue

Most of previous works bind operations with the fastest (usually also the largest) module in synthesis. It helps to simplify the problem and is useful if the array size is large enough. However, in most cases, especially as segregation area is wrapped around, this strategy may be ineffective, and modules with large size may make the array overcrowded. For example, five mixing operations are scheduled with two different binding in Figure 13 using the modules in Table 1. The left one, bind all operations to the fastest module, mix24, requiring 11 cycles to finish. In contrast, the right one, bind to different module, only needs 9 cycles instead. This example shows that binding operations with the fastest module is not always a good policy. To explore more possibility and to enhance utilization of array area, integrate module selection process into synthesis flow to select the suitable modules dynamically is necessary.

Figure 13. Comparison between non-module selection and module selection

Chapter 5 Proposed Algorithm

5.1 Overview

The problem formulation of this work is as following: given three information, 1) DMFB architecture, which consists of chip size, the count of dispensing ports, number of detecting cells or other functional devices, 2) a sequencing graph G, which describe the biochemical application, and 3) a resource library L, which contains all feasible module and information of other devices, to determine the binding and the scheduling results. These results are without dependency and resource violation, and the latency of the biochemical assay is minimized. The overall flow of the proposed algorithm is shown in Figure 14.

Sequencing graph Resource library

Initial binding & scheduling

Iterative operation rebinding

Scheduling & binding results

Architecture

Figure 14. The overview of proposed algorithm

In the beginning of the process, bind all operations to the fastest module (all the operations are bound). Then, a scheduler, storage minimization scheduling, based on list scheduling is performed to obtain an initial scheduling result. Following, the initial solution will be sent to the iterative operation rebinding process to refine the solution iteratively. More details of the storage minimization scheduling and iteratively operation rebinding processes will be described in section 5.2 and section 5.3.

5.2 Storage Minimization Scheduling

Storage minimization scheduling (SMS) is derived from the well-known list scheduling algorithm. At first, all ready operations are put in the ready list (Lready).

There are four kinds of ready operations as following: 1) a dispensing operation, 2) a mixing operation whose parents are dispensing operation, 3) a mixing operation whose parents are a dispensing operation and a finished mixing operation and 4) a mixing operation whose parents are two finished mixing operation. Then, schedule operations in ready list (L_ready) one by one in each cycle according to their priority.

Compared to conventional list-based scheduling in DMFB, we schedule dispensing operations more flexible and propose a new priority. Previous works, like [12], [15]

and [18], discard dispensing operations from the ready list, Lready, since these operations can be scheduled at any cycle if there are enough resource ports. The dispensing operations and their successor operations are scheduled sequentially to avoid st_d. This scheduling rule guarantees no st_d. However, it may lead long latency due to inflexibility in scheduling dispensing operations, as describe in section 4.2. We modified the rule by scheduling dispensing operations before their successors but not necessary right before them. Figure 15 is an example of the scheduling rule. If an operation v can be scheduled at cycle t (t=3), check previous cycles from t-1 to 1 whether there are available reservoirs and enough area to save the resultant of reservoirs. The process starts form t-1, since we want to reduce the usage of st_d as more as possible. If there are available reservoirs at cycle 1 and enough area to save resultant at cycle 2, schedule dispensing operations of operation v at cycle 1.

Otherwise, schedule the other operation.

In section 4.1, we realize that PS has a bad result with the complex graph since it entirely minimizes st_m but ignores the critical path issue. Therefore, we proposed a priority as following:

(2)

The proposed priority considers the critical path and st_m minimization at the same time. Since sf(v) implies the reduction of stm, we use it to indicate stm minimization.

However, since sf(v) is quite smaller than each path length, we multiply sf(v) and a constant  to balance its value. This constant is a significant large number which is usually set a half of the critical path length. An example in Figure 16(a) will be demonstrated to show the importance of stm minimization. At first, all mixing operations which are presented by circular shape are bound to the fastest module whose area is 8 cells and duration is 2. According to the priority which only considers the critical path (e.g., priority of operation m = 4 and operation r = 2), assign the scheduling order to each mixing operation like the red number in both Figure 16(a) and Figure 16(b).

Exist

Schedule the other operation

Figure 15. An example of proposed scheduling rule

Figure 16. (a) An sequencing graph of an assay

(b) Scheduling result using the priority only consider critical path

If operations with the same priority, set their scheduling order randomly. The scheduling result of the assay is shown in Figure 16(b) and the latency is 15. As we can see, this priority will lead to a BFS scheduling order and lots of stm. On the other hand, if we schedule the assay in Figure 16(a) using the proposed priority, the scheduling order will changed to the red numbers in Figure 17(a). At first, operation m and operation n will be scheduled first like using the priority which only consider critical path. However, the scheduler using the proposed priority will select the operation which can save more storage units like operation r and operation s next. As a result, the proposed priority will lead a DFS scheduling result as shown in Figure 17(b) and potentially reduce the amount of stm. Therefore, the latency reduces from 15 to 13 when schedule operations using the proposed priority. Figure 18 shows the overall flow of SMS.

(a) (b)

Figure 17. (a) A changed scheduling order (b) Scheduling result using proposed priority

Figure 18. Overall flow of SMS

5.3 Iterative Operation Rebinding

As mentioned above, the scheduling algorithm for SMS potentially minimizes the latency by reducing the amount of storage units. However, further improvement may still be achieved by rebinding which explores more possibility and to enhance utilization of array area as mentioned in section 4.2. The overall algorithm of iterative operation rebinding is depicted in Figure 19. At first, operation rebinding is performed to determine the new binding of all operations for latency minimization. If the latency for the new binding result is shorter than the previous one, the counter k is reset to M;

if not, set k=k1. The process does not terminate until the counter k equals to zero.

Figure 19. Flow of iterative operation rebinding

5.3.1 Operation Rebinding

The objective in operation rebinding is to find the new binding result to reduce latency. The simplest way to achieve the objective is listing all possible binding results and selecting the one with the most latency improvement. However, it is not realistic. Since there are too many binding results to be checked, it will cost too much execution time. Our strategy is rebinding an operation once at a time until all operations are rebound. Here, a rebinding process in an iteration means changing the module of an operation v to another module m. (v, m) is called binding pair (BP). Take Figure 20 (a) for example. There are 3 unlocked mixing operation c, f, and g and each mixing operation can be bound to 3 possible modules according to Table 1. Due to the above conditions, there are 9 feasible BPs can be selected to rebind as shown in Figure 20(b). Therefore, operation rebinding can be regard as finding the BP with the highest latency improvement in each iteration.

(a) (b)

Figure 20. (a) A graph with 3 mixing operation c, f, and g without rebinding (b) All BPs of operations without rebinding

Figure 21 shows the operation rebinding flow. The first step in the flow is gain calculation. It calculates latency gains for all possible BPs with unlocked operations (i.e., operations are not rebound) to judge the latency improvement of each BP. There are two latency gains: the primary latency gain (Gp) and the secondary latency gain (G_s). They represent the latency improvement. The physical meaning of them will be described in section 5.3.2. After gain calculation, BP selection will be performed. The BP with the highest G_p will be selected first. Since the unchanging binding is also a kind of rebinding processes (i.e. existing a BP with zero Gp), a BP with negative Gp

will not be considered here. If there is more than one BP with the highest G_p, the BP with the highest Gs is selected. According to the selected BP (v, m), operation v will be rebound to module m, and m will be locked after. Finally, rescheduling is performed by SMS to get the current latency. These steps mentioned above does not terminate until all operations are locked.

Gain calculation

Rescheduling BP selection

All operations are locked All unlocked operations

Rebind & Lock

False True

Figure 21. Operation rebinding flow

5.3.2 Latency Gains

There are two latency gains, primary latency gain (G_p) and secondary latency gain (Gs) to determine the latency improvement of each BP. The first one Gp can be represented as the following equation:

(3)

In (3), T is the latency before rebinding, and T' means the latency while the operation v bound with the module m, and SMS is performed. The second one Gs is presented by the following equations:

(4)

Equation (4) is composed of two parts, local latency improvement and storage unit reduction. Since Gp only represents the improvement of the critical path, some BPs with the potential to make a shorter latency in the next iteration may be ignored.

Therefore, G_s indicates this potential by the sum of end cycle improvements, t_{i }t_i', for all operations. However, since operations locate on different paths, the importance of each end cycle improvement is not the same. To indicate differences of them, the weight  in (5)(6) multiplies each end cycle improvement together.

(5) (6)

(5) means the ratio between the longest path passes through an operation and the critical path length. Due to the fact that the number of storage units affects the total latency, storage unit reduction, nst  nst', should also be considered. nst means the average storage units count in each cycle before rebinding, as shown in (7). nst' implies the average storage units in each cycle when operation v is bound to module m. Besides, we consider that storage unit reduction is contributed by all operations.

Therefore, the number of operations, N, multiplies the storage unit reduction together.

(7)

Chapter 6 Experimental Results

The proposed algorithm has been implemented in C++ on a Linux machine. All experiments are conducted on workstation with an Intel Xeon 2.4GHz CPU with 72GB RAM. The ILP solver we used is Gurobi optimizer 5.0 [29]. Three real-life test cases: multiplexed in-vitro diagnostics, PCR, and Protein assay [13] and six random cases of sample preparation are used here to evaluate our algorithm. Since previous works are implemented with different area constraints and resource libraries, the experimental result they proposed can't compare with each other. Therefore, we want to re-implement them to compare with LOSMOS. Besides, the other two versions of M-LS, M-LS (DEC) and M-LS (INC) are also implemented. They are proposed in [18]

to compare with PS. Both of them force dispensing operation scheduling right before their successor, but M-LS (DEC) uses the increasing order of priority and M-LS (INC) uses the inversed one. However, we cannot re-implement GA and HGA entirely and these two works only report the experimental results using multiplexed in-vitro diagnostics. LOSMOS will compare with the methods mentioned above using multiplexed in-vitro diagnostics with in-vitro resource library [13] in the first experiment. Table 2 shows the experimental result of the first experiment. Since the execution time of all methods is less than 5 sec, we do not report the table of execution time. As we can see, LOSMOS is 1.07 times faster than previous works on average. However, the improvement in LOSMOS is quite small. The reason we thought is that the number of operations in above cases are too small (16~64 operations) thus those solutions are all near optimal. Therefore, we will use cases with larger number of operations to perform in the second experiment. Due to lack of cases

and related resource library, we randomly generate six cases of the multiple-targets sample preparation reaction flow and use the in-vitro resource library. However, there are many set of modules and detectors can be performed. For fairly, we choose the set of module and the detector with the longest latency. The results of the second experiment are shown in Table 3. In this experiment, we find that SMS is 1.21 times faster than most previous works on average except ILP. It shows that scheduling method considering storage minimization has a lot of benefits for latency minimization. In additional to SMS, we further minimize latency using module selection ability in LOSMOS. As the last column in Table 3, LOSMOS performs 1.35 times faster on average than previous works and even 1.04 times faster than ILP with no module selection ability in short execution time (2~6 sec). Therefore, these results prove that LOSMOS can achieve good performance in large cases with little run time.

Finally, we use two real cases: Protein and PCR. To evaluate our synthesis algorithm can apply in real-life in the third experiment. The resource library using here is proposed in [30]. Table 4 shows this experimental result. Since PCR only has seven operations, each algorithm achieves optimal solution. In contrast, protein is a large case. The latency of LOSMOS is equal or better than previous works.

Table 2. Experiment 1  multiplexed in-vitro diagnostics

Area

Table 3. Experiment 2  sample preparations

Table 4. Experiment 3  two real cases: PCR and Protein

Area

Sample_preparation_61 100 169* 306 281 241 247 165

Sample_preparation_65 100 131 219 157 147 145 129

Sample_preparation_67 100 133 221 205 161 155 133

Sample_preparation_70 100 139* 191 177 179 153 129

Sample_preparation_78 100 -* 370 295 263 233 195

Sample_preparation_84 100 175* 291 223 - 203 163

Avg. 1.04 1.73 1.45 1.31 1.24 1

“-” The method fails in that case

“*” ILP does not terminate in 24-hours; the current best result is reported

Area

Sample_preparation_61 100 >24hr <1s <1s <1s <1s 3.2s Sample_preparation_65 100 0.7hr <1s <1s <1s <1s 2.9s Sample_preparation_67 100 3.1hr <1s <1s <1s <1s 3.1s Sample_preparation_70 100 >24hr <1s <1s <1s <1s 3.3s Sample_preparation_78 100 >24hr <1s <1s <1s <1s 6.1s Sample_preparation_84 100 >24hr <1s <1s <1s <1s 5.4s

Area

Protein_103 100 179 267 179 215 185 179

Area

PCR_7 100 <1s <1s <1s <1s <1s <1s Protein_103 100 4.3hr <1s <1s <1s <1s 10.07s

Chapter 7 Conclusion

In this thesis, we proposed the latency-optimization synthesis with module selection (LOSMOS) on DMFBs. LOSMOS consists of two major parts: the storage minimization scheduling (SMS) and the iterative rebinding procedure. Because the storage count is highly related to the assay latency, the first part, SMS, takes the saving factor into consideration for storage minimization and achieves better results accordingly. The second part further iteratively evaluates and improves the binding of each operation with latency gains. According to the ability of module selection, LOSMOS outperforms a state-of-the-art method, Path-scheduler, by 18.22% in terms of latency reduction on average, and even performs better than the optimal ILP method without module selection. Undoubtedly, the good performance and short computation time make LOSMOS a promising option in DMFB synthesis.

References

[1] International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 2011

[2] T.-Y. Ho, J. Zeng, and K. Chakrabarty, “Digital microfluidic biochip: a vision for functional diversity and more than Moore,” in Proc. IEEE/ACM International Conference on Computer-Aided Design, 2010, pp. 578-585

[3] M. G. Pollack, A. D. Shenderov and R. B. Fair, “Electrowetting-based actuation of droplets for integrated microfluidics,” Lab. Chip, vol. 2, no.2, pp. 96-101, Feb.

2002

[4] V. Srinivasan, V. K. Pamula, M. G. Pollack, and R. B Fair, “Clinical diagnostics on human whole blood, plama, serum, urine, saliva, sweat, and tears on a digital microfluidic platform,” in Proc. Micro Total Analysis Systems, 2003, pp.

1287-1290

[5] K. Chakrabarty, “Design automation and test solutions for digital microfluidic biochips,” IEEE Transactions on Circuits and Systems I, vol. 57, no. 1, pp. 4–17,

在文檔中具有模組選擇能力之延遲最佳化數位微流體生物晶片合成技術 (頁 21-0)