Our Algorithm - Our Proposed Method - 高階合成中使用臆測編碼轉換技術之排程等效驗證

Chapter 4 Our Proposed Method

4.3 Our Algorithm

According to the Theorem 2 and Definition 17, M_β is contained M_α if there exists a path cover P_β= { p_β0, ... , p_βn } of M_β and a set of path P_α= { p_α0, ... , p_αn } of M_α such that p_βi ≈

p_αi for all i, 0≤i≤n. If a set of path P_α of M_α is also a path cover of M_α, it implies that M_β≊ M_α.

Lemma 2 M_β≊ Mα if there exists a path cover P_β= { p_β0, ... , p_βn} of M_β and a path cover P_α= { p_α0, ... , p_αn} of M_α such that p_βi≈ pαi for all i, 0 ≤ i ≤ n.

Since extending a path means to find a new path cover, we insert the cutpoints into not only M_β but also M_α; then, we obtain initial path covers P_β and P_α of M_β and M_α, respectively.

For each path β^{of P}β , our algorithm attempts to find an equivalent path in M_α by finding all possible paths in M_α depending on CSP. Let β find an equivalent pathα^{" of M}α where α^{" =} α1α2^andα1^andα2 are initial paths. In this case, at first, β can’t find an equivalent path from the initial path cover of M_α, and then our algorithm extends α1 to find a new path cover, P_α. Therefore, β can find an equivalent path α1α2^{from P}α . If each path of the final P_β has an equivalent path of the final P_α and each path of the final P_α has an equivalent path of the final P_β, M_β≊ Mα .

Fig. 19 illustrates the overview of our algorithm and the detail is depicted in Fig. 20. Our algorithm reads two FSMDs, M_β and M_α, as the inputs and produces a set of equivalent paths, E, as an output. Initially, the set of equivalent paths E is empty and the set of working list L contains no path waiting to find an equivalent path in M_α. Let “(q_β0,q_α0)” be the only one member of corresponding state pair set, i.e. CSP_βα={(q_β0,q_α0)}. After the initialization, our algorithm inserts the cutpoints into both M_β and M_α. Subsequently, it finds initial path cover P_β0 of M_β and P_α0 of M_α. Next, it finds all possible available statements, i.e. the in sets of

each cutpoint in both M_β and M_α. Hence, it is ready for finding equivalent paths.

First, our algorithm finds “β ^{= q}βi⇒q_βj” from P_β0 depending on “(q_βi,q_αm)” which is in CSP_βα, and then it finds an equivalent path, starting from q_αm, of P_α0 in M_α. If an equivalent path “α ^=qαm⇒q_αn” is found, it records the equivalent path pair “(β^,α)” in E and adds

“(q_βj,q_αn)” to CSP_βα. Subsequently, it removes β^{from P}β0 and α^{from P}α0 . Checking two paths are equivalent or not may have three situations. In the first situation, they are exactly equivalent. In the second situation, if they are not equivalent because of some final values of variables, our algorithm checks those variables by CE Algorithm. If all of them are not effective variables, β finds an equivalent path. Otherwise, in the third situation, according to Lemma 1, our algorithm generates β' by concatenating in(q_βi) and β, and it also generates each α' by concatenating in(q_αm) and each α ^{in M}α. It compares β' with each α'. If the equivalent path of β' is not found, it extends β (go to step 11). Otherwise, β finds an equivalent path (go to step 10). Our algorithm repeats the process (GetEquivalentPath) until all paths of P_β0 finding an equivalent path in M_α. Finally, check if P_α0 is empty. If it is, it implies that all paths in E constitute path covers of M_β and M_α. Hence, M_β≊ Mα .

Fig. 19 An Overview of Our Algorithm

L =

Obviously, the termination of our algorithm depends on the number of paths needed to be checked and the number of CSPs of CSP_βα. Since there are finite cutpoints and finite initial paths in both M_β and M_α, the combinations of CSPs are finite and our algorithm can only extend finite times to generate finite concatenated paths. Therefore, our algorithm can always

terminate.

Since step 8, 9 and 10 of our algorithm in Fig. 20 ensure that E contains only pairs of equivalent paths of P_β and P_α and as extending a path, our algorithm concatenates all relevant paths to build a new path cover and removes the relevant cutpoint and paths from E, P_β, and P_α; therefore, all paths of E has no cutpoint be an internal state and all are the paths starting from a cutpoint and ending at a cutpoint. As a result, as our algorithm executes successfully, all equivalent paths are contained in E and they constitute the path covers of M_β and M_α.

Complexity

n = the number of states of M e = the number of edges of M

ki = the largest indegree of all states in M ko = the largest outdegree of all states in M

The complexities of FAS Algorithm takes O(e²*ki). Let the complexity of comparing two statements to be |F|. As findEqlPath, a path in M_α can extend at most n times and each path extension needs to concatenate (ki*ko) paths. Therefore, the complexity of findEqlPath is O(n*ki*ko*|F|). Note that |F| usually takes much longer time than CheckEffect and AddAvailableStatement. When check if the new CSP is in the CSP_βα, it has the n² possible combinations. In the worst case, one path is extended n times. There are ki*ko*(n-1) + ki*ko²*(n-1)*(n-2) + … + ki*ko^(n-1)*(n-1)*(n-2)…2.1 ≈ ki*ko^(n-1)*(n-1)^(n-1) number of paths.

Therefore, the complexity of our algorithm is O(nⁿ*ki²*koⁿ*|F|).

In the best case, each path in M_β can directly find an equivalent path in M_α. No path extension and CheckEffect are needed. Therefore, the complexity is O(e²*ki+n*|F|) ≈ O(n*|F|).

Chapter 5 Experimental Results

Our equivalence checking algorithm and Karfa’s algorithm [26] have been implemented in C. We compare two paths with symbolic execution. All benchmarks have been run on a 1.86 GHz Intel Core 2 CPU PC with 2 GB RAM. The run time of all benchmarks is less than 2 seconds. One benchmark, diffeq, is data intensive; some are control intensive, such as gcd, barcaode, tlc and lru ; some are control and data intensive, such as mode and kalman [32][33].

test1 and test2, which are built by ourselves, are control and data intensive. The results shown in tables are sorted by our run time.

Table I, Table III and Table V list the characteristics of the benchmarks in terms of the number of states in M_β, states in M_α, variables in V_β∩V_α, statements in M_β, and statements in M_α. Table II, Table IV and Table VI show the verification results. They list in terms of the number of initial cutpoints in M_β, initial paths in M_β, initial cutpoints in M_α, initial paths in M_α, path extensions in M_β with the iterations back to step3, and equivalent paths and in terms of the employed scheduling techniques for each case, the run time of our algorithm, and the run time of Karfa’s algorithm.

Table I and Table II are the benchmarks where those cases are all equivalent cases and scheduled by PBS or SPARK [15] with removing the SSp in manually. Therefore, our algorithm and Karfa’s can handle the cases. Table III and Table IV are the equivalent cases scheduled by SPARK except test1 and test2 which are transformed manually. Karfa’s algorithm does not support Sp, RSp, SSp and CSE; therefore, it fails to all the cases in Table IV. The cases in Table V and Table VI are not equivalent. Each case is added error manually.

barcode_err and kalman_err are transformed from barcode and kalman by adding an improper MU; mode_err removes one operation from M ; lru_err changes the end state of a

path of M_α; test2_err has an improper CSE. Our algorithm can find that they are not equivalent.

For most cases, run time depends mainly on the numbers of iterations and path extensions.

The run time of test2_err is larger than kalman_err because the algorithms extend 80 paths in kalman_err as they extend 524 paths in test2_err. The number of iterations contained in a pair of braces is the number of paths compared by the algorithms. The number of path extensions is the number of failed paths as finding equivalent paths. Since our algorithm only runs one pass, our run time of barcode of Table II is less than Karfa’s. However, if two paths are not equivalent in findEquivalentPath, our algorithm performs CheckEffect or/and AddAvaiableStatement, and then compares again. Besides, FindAvaiableStatement also consumes time. Therefore, our average run time is lager than Karfa’s about 2 times.

The experimental results, including one high complexity case kalman, indicate that our algorithm is usable for verifying BB-based scheduling, PBS, code motions and CSE.

Abbreviations of each scheduling technique:

PBS: Path-based scheduling,

MU: Merging up, MD: Merging down

DU: Duplicating up, DD: Duplicating down, UM: Useful move, Sp: Speculation, RSp: Reverse speculation, SSp: Safe-speculation, CSE: Common subexpression elimination.

Table I Characteristics of Equivalent Cases 1

Table II Results of Equivalent Cases 1

Ours [26]

Table III Characteristics of Equivalent Cases 2

Table IV Result of Equivalent Cases 2

Ours [26]

kalman 101 202 102 203 34(202) 168 SSp,DU,UM P 1437 F 182.8

Table V Characteristics of Not Equivalent Cases

Table VI Results of Not Equivalent Cases

Ours [26]

Chapter 6 Conclusion & Future Works

In this thesis, a formal verification method is proposed for the scheduling verification in HLS. This method is capable of BB-based scheduling and PBS. It is also well suited to verify some popular code transformation techniques: DD, DU, MD, MU, UM, Sp, RSp, SSp, and CSE. But it still not supports some code transformation techniques, such as loop invariant and copy propagation.

Fig. 21 shows an example of loop invariant. Since all variables of the statement “c⇐x+y”

are not modified in the loop (q_βi1 → q_βi2 __→ q_βi1), loop invariant technique moves

“c⇐x+y” out from the loop. Our algorithm fails to this situation. Since the path ending at q_βi1 and the path ending at q_αj1 are not equivalent, our algorithm needs to extend the path. After the path extension, i.e. removing the cutpoints q_βi1 and q_βi2, it becomes a walk which is not a path. Therefore, our algorithm fails.

Fig. 21 An Example of Loop Invariant

Copy propagation is derived from the compiler. The statements in “g⇐h” form are called copy statements where “g” and “h” are variables. Copy propagation replaces “g” with “h” in all the statements that have flow dependencies with “g⇐h”. It is illustrated in Fig. 22. Since

“a⇐x+y” and “c⇐x+y” have the common subexpression, CSE replaces the expression of

“c⇐x+y” with “a”. Then, copy propagation finds “d⇐c” having data dependence with

“c⇐a+b” and it replaces “c” with “a”. Our approach fails to this case. Since start from the CSP “(q_βi2,q_αj2)”, the final values of “d ” of “q_βi2 → q_βi3” and “q_αi2 → q_αi3” are not equivalent because “c ≠ a”.

Fig. 22 An Example of Copy Propagation

References

[1] D. D. Gajski, N.D. Dutt, A.C.-H. Wu, and S.Y.-L. Lin, “High-Level Synthesis:

Introduction to Chip and System Design,” Kluwer, 1992.

[2] A. A. Jerraya, H. Ding, P. Kission, and M. Rahmouni, “Behavioral Synthesis and Component Reuse with VHDL,” Kluwer, 1997.

[3] P. Coussy, D.D. Gajski, M. Meredith, and A. Takach, “An Introduction to High-Level Synthesis,” IEEE Design & Test of Computers, vol. 26, page 8-17, July-Aug. 2009.

[4] C. Y. Hitchcock and D.E. Thomas, “A Method of Automatic Data Path Synthesis,”

Design Automation Conference, page 484-489, Jun. 1983.

[5] B. M. Pangrle and D.D. Gajski, “Slicer: A State Synthesizer for Intelligent Silicon Compilation,” IEEE International Conference Computer Design: VLSI in Computers &

Processors, Oct. 1986.

[6] P. G. Paulin and J.P. Knight, “Force-directed Scheduling for the Behavioral Synthesis of ASIC’s,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 8, page 661-679, Jun. 1989.

[7] R. Camposano, “Path-based Scheduling for Synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 10, page 85-93, Jan.

1991.

[8] G. Lakshminarayana, A. Raghunathan, and N.K. Jha, “Incorporating Speculative Execution into Scheduling of Control-flow Intensive Behavioral Descriptions,” Design Automation Conference, page 108-113, Jun. 1998.

[9] L.C.V. dos Santos and J.A.G. Jess, “A Reordering Technique for Efficient Code Motion,”

Design Automation Conference, page 296-299, Jun. 1999.

[10] M. Rim, Y. Fann, and R. Jain, “Global Scheduling with Code-motions for High-level Synthesis Applications,” IEEE Transactions on VLSI Systems, vol. 3, page 379-392,

Sept. 1995.

[11] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Dynamic Conditional Branch Balancing during the High-level Synthesis of Control-intensive Designs,” Design, Automation and Test in Europe Conference and Exhibition, vol. 1, page 270-275, Dec. 2003.

[12] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Spark: A High-level Synthesis Framework for Applying Parallelizing Compiler Transformations,” International Conference on VLSI Design, page 461-466, Jan. 2003.

[13] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Loop Shifting and Compaction for the High-level Synthesis of Designs with Complex Control Flow,” Design, Automation and Test in Europe Conference and Exhibition, vol. 1, page 114-119, Feb. 2004.

[14] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Using Global Code Motions to Improve the Quality of Results for High-level Synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, page 302-312, Feb. 2004.

[15] S. Gupta, R. Gupta, N. Dutt, and A. Nicolau, “Spark: A Parallelizing Approach to the High-level Synthesis of Digital Circuits,” Kluwer, 2004.

[16] R. Ernst and J. Bhasker, “Simulation-based Verification for High-Level Synthesis Applications,” IEEE Design & Test of Computers, vol.8, page 14-20, Mar. 1991.

[17] R. A. Bergamaschi and S. Raje, “Observable Time Windows: Verifying High-Level Synthesis Results,” IEEE Design & Test of Computers, vol. 14, page 40-50, April-Jun.

1997.

[18] T.-H. Chiang and L.-R. Dung, “Verification Method of Dataflow Algorithms in High-Level Synthesis,” Journal of Systems and Software, vol. 80, page 1256-1270, Aug.

2007.

[19] N. Narasimhan, E. Teica, R. Radhakrishnan, S. Govindarajan, and R. Vemuri, “Theorem Proving Guided Development of Formal Assertions in a Resource-Constrained Scheduler for High-Level Synthesis,” Formal Methods in System Design, vol. 19, page 237-273,

Nov. 2001.

[20] R. Radhakrishnan, E. Teica, and R. Vemuri, “An Approach to High-Level Synthesis Validation using Formally Verified Transformations,” IEEE International High-Level Design Validation and Test Workshop, page 80-85, August-Oct. 2000.

[21] R. Radhakrishnan, E. Teica, and R. Vemuri, “Verification of Basic Block Schedules using RTL Transformations,” Lecture Notes in Computer Science, vol. 2144, page 173-178, Jan. 2001.

[22] N. Mansouri and R. Vemuri, “A Methodology for Automated Verification of Synthesized RTL Designs and Its Integration with a High-Level Synthesis Tool,” Lecture Notes in Computer Science, vol. 1522, page 204-221, Jan. 1998.

[23] Y. Kim , S. Kopuri, and N. Mansouri, “Automated Formal Verification of Scheduling Process using Finite State Machines with Datapath (FSMD),” International Symposium on Quality Electronic Design, page 110-115, Aug. 2004.

[24] H. Eveking, H. Hinrichsen, and G. Ritter, “Automatic Verification of Scheduling Results in High-Level Synthesis,” Design, Automation and Test in Europe Conference and Exhibition, page 59-64, Mar. 1999.

[25] Y. Kim and N. Mansouri, “Automated Formal Verification of Scheduling with Speculative Code Motions,” Great Lakes symposium on VLSI, page 95-100, 2008.

[26] C. Karfa, D. Sarkar, C. Mandal, and P. Kumar, “An Equivalence-Checking Method for Scheduling Verification in High-Level Synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, page 556-569, Mar.

2008.

[27] S. Owre, J. M. Rushby, and N. Shankar, “PVS: A Prototype Verification System,”

Lecture Notes in Computer Science, vol. 607, page 748-752, Jun. 1992.

[28] P. G. Paulin and J. P. Knight, “Scheduling and Binding Algorithms for High-Level Synthesis,” Design Automation Conference, page 1-6, Jun. 1989.

[29] E. Teica and R. Vemuri, “A Mechanical Proof of Completeness for a Set of Register-Level Transformation,” Technical Report 257/05/01/ECECS, University of Cincnnati, 2001.

[30] Z. Manna, “Mathematical Theory of Computation,” McGraw-Hill, 1974.

[31] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein,

“Introduction to Algorithms,” McGraw-Hill, 2001.

[32] P. R. Panda and N. D. Dutt, “1995 high level synthesis design repository,” International Symposium on System Synthesis, page 170-174, Sept. 1995.

[33] http://computing.ece.vt.edu/~mhsiao/hlsyn.html

在文檔中高階合成中使用臆測編碼轉換技術之排程等效驗證 (頁 52-0)