• 沒有找到結果。

Chapter 4 Our Proposed Method

4.3 Our Algorithm

According to the Theorem 2 and Definition 17, Mβ is contained Mα if there exists a path cover Pβ = { pβ0 , ... , pβn } of Mβ and a set of path Pα = { pα0 , ... , pαn } of Mα such that pβi

pαi for all i, 0≤i≤n. If a set of path Pα of Mα is also a path cover of Mα, it implies that Mβ Mα .

Lemma 2 Mβ ≊ Mα if there exists a path cover Pβ = { pβ0 , ... , pβn } of Mβ and a path cover Pα = { pα0 , ... , pαn } of Mα such that pβi ≈ pαi for all i, 0 ≤ i ≤ n.

Since extending a path means to find a new path cover, we insert the cutpoints into not only Mβ but also Mα ; then, we obtain initial path covers Pβ and Pα of Mβ and Mα , respectively.

For each path β of Pβ , our algorithm attempts to find an equivalent path in Mα by finding all possible paths in Mα depending on CSP. Let β find an equivalent pathα" of Mα where α" = α1α2 and α1 and α2 are initial paths. In this case, at first, β can’t find an equivalent path from the initial path cover of Mα , and then our algorithm extends α1 to find a new path cover, Pα . Therefore, β can find an equivalent path α1α2 from Pα . If each path of the final Pβ has an equivalent path of the final Pα and each path of the final Pα has an equivalent path of the final Pβ , Mβ ≊ Mα .

Fig. 19 illustrates the overview of our algorithm and the detail is depicted in Fig. 20. Our algorithm reads two FSMDs, Mβ and Mα , as the inputs and produces a set of equivalent paths, E, as an output. Initially, the set of equivalent paths E is empty and the set of working list L contains no path waiting to find an equivalent path in Mα . Let “(qβ0,qα0)” be the only one member of corresponding state pair set, i.e. CSPβα ={(qβ0,qα0)}. After the initialization, our algorithm inserts the cutpoints into both Mβ and Mα . Subsequently, it finds initial path cover Pβ0 of Mβ and Pα0 of Mα . Next, it finds all possible available statements, i.e. the in sets of

each cutpoint in both Mβ and Mα . Hence, it is ready for finding equivalent paths.

First, our algorithm finds “β = qβiqβj” from Pβ0 depending on “(qβi,qαm)” which is in CSPβα , and then it finds an equivalent path, starting from qαm , of Pα0 in Mα . If an equivalent path “α =qαmqαn” is found, it records the equivalent path pair “(β,α)” in E and adds

“(qβj,qαn)” to CSPβα . Subsequently, it removes β from Pβ0 and α from Pα0 . Checking two paths are equivalent or not may have three situations. In the first situation, they are exactly equivalent. In the second situation, if they are not equivalent because of some final values of variables, our algorithm checks those variables by CE Algorithm. If all of them are not effective variables, β finds an equivalent path. Otherwise, in the third situation, according to Lemma 1, our algorithm generates β' by concatenating in(qβi) and β, and it also generates each α' by concatenating in(qαm) and each α in Mα. It compares β' with each α'. If the equivalent path of β' is not found, it extends β (go to step 11). Otherwise, β finds an equivalent path (go to step 10). Our algorithm repeats the process (GetEquivalentPath) until all paths of Pβ0 finding an equivalent path in Mα . Finally, check if Pα0 is empty. If it is, it implies that all paths in E constitute path covers of Mβ and Mα . Hence, Mβ ≊ Mα .

Fig. 19 An Overview of Our Algorithm

L =

Obviously, the termination of our algorithm depends on the number of paths needed to be checked and the number of CSPs of CSPβα . Since there are finite cutpoints and finite initial paths in both Mβ and Mα , the combinations of CSPs are finite and our algorithm can only extend finite times to generate finite concatenated paths. Therefore, our algorithm can always

terminate.

Since step 8, 9 and 10 of our algorithm in Fig. 20 ensure that E contains only pairs of equivalent paths of Pβ and Pα and as extending a path, our algorithm concatenates all relevant paths to build a new path cover and removes the relevant cutpoint and paths from E, Pβ , and Pα ; therefore, all paths of E has no cutpoint be an internal state and all are the paths starting from a cutpoint and ending at a cutpoint. As a result, as our algorithm executes successfully, all equivalent paths are contained in E and they constitute the path covers of Mβ and Mα .

Complexity

n = the number of states of M e = the number of edges of M

ki = the largest indegree of all states in M ko = the largest outdegree of all states in M

The complexities of FAS Algorithm takes O(e2*ki). Let the complexity of comparing two statements to be |F|. As findEqlPath, a path in Mα can extend at most n times and each path extension needs to concatenate (ki*ko) paths. Therefore, the complexity of findEqlPath is O(n*ki*ko*|F|). Note that |F| usually takes much longer time than CheckEffect and AddAvailableStatement. When check if the new CSP is in the CSPβα , it has the n2 possible combinations. In the worst case, one path is extended n times. There are ki*ko*(n-1) + ki*ko2*(n-1)*(n-2) + … + ki*ko(n-1)*(n-1)*(n-2)…2.1 ≈ ki*ko(n-1)*(n-1)(n-1) number of paths.

Therefore, the complexity of our algorithm is O(nn*ki2*ko n*|F|).

In the best case, each path in Mβ can directly find an equivalent path in Mα . No path extension and CheckEffect are needed. Therefore, the complexity is O(e2*ki+n*|F|) O(n*|F|).

Chapter 5

Experimental Results

Our equivalence checking algorithm and Karfa’s algorithm [26] have been implemented in C. We compare two paths with symbolic execution. All benchmarks have been run on a 1.86 GHz Intel Core 2 CPU PC with 2 GB RAM. The run time of all benchmarks is less than 2 seconds. One benchmark, diffeq, is data intensive; some are control intensive, such as gcd, barcaode, tlc and lru ; some are control and data intensive, such as mode and kalman [32][33].

test1 and test2, which are built by ourselves, are control and data intensive. The results shown in tables are sorted by our run time.

Table I, Table III and Table V list the characteristics of the benchmarks in terms of the number of states in Mβ , states in Mα , variables in Vβ∩Vα , statements in Mβ , and statements in Mα . Table II, Table IV and Table VI show the verification results. They list in terms of the number of initial cutpoints in Mβ , initial paths in Mβ , initial cutpoints in Mα , initial paths in Mα , path extensions in Mβ with the iterations back to step3, and equivalent paths and in terms of the employed scheduling techniques for each case, the run time of our algorithm, and the run time of Karfa’s algorithm.

Table I and Table II are the benchmarks where those cases are all equivalent cases and scheduled by PBS or SPARK [15] with removing the SSp in manually. Therefore, our algorithm and Karfa’s can handle the cases. Table III and Table IV are the equivalent cases scheduled by SPARK except test1 and test2 which are transformed manually. Karfa’s algorithm does not support Sp, RSp, SSp and CSE; therefore, it fails to all the cases in Table IV. The cases in Table V and Table VI are not equivalent. Each case is added error manually.

barcode_err and kalman_err are transformed from barcode and kalman by adding an improper MU; mode_err removes one operation from M ; lru_err changes the end state of a

path of Mα ; test2_err has an improper CSE. Our algorithm can find that they are not equivalent.

For most cases, run time depends mainly on the numbers of iterations and path extensions.

The run time of test2_err is larger than kalman_err because the algorithms extend 80 paths in kalman_err as they extend 524 paths in test2_err. The number of iterations contained in a pair of braces is the number of paths compared by the algorithms. The number of path extensions is the number of failed paths as finding equivalent paths. Since our algorithm only runs one pass, our run time of barcode of Table II is less than Karfa’s. However, if two paths are not equivalent in findEquivalentPath, our algorithm performs CheckEffect or/and AddAvaiableStatement, and then compares again. Besides, FindAvaiableStatement also consumes time. Therefore, our average run time is lager than Karfa’s about 2 times.

The experimental results, including one high complexity case kalman, indicate that our algorithm is usable for verifying BB-based scheduling, PBS, code motions and CSE.

Abbreviations of each scheduling technique:

PBS: Path-based scheduling,

MU: Merging up, MD: Merging down

DU: Duplicating up, DD: Duplicating down, UM: Useful move, Sp: Speculation, RSp: Reverse speculation, SSp: Safe-speculation, CSE: Common subexpression elimination.

Table I Characteristics of Equivalent Cases 1

Table II Results of Equivalent Cases 1

Ours [26]

Table III Characteristics of Equivalent Cases 2

Table IV Result of Equivalent Cases 2

Ours [26]

kalman 101 202 102 203 34(202) 168 SSp,DU,UM P 1437 F 182.8

Table V Characteristics of Not Equivalent Cases

Table VI Results of Not Equivalent Cases

Ours [26]

Chapter 6

Conclusion & Future Works

In this thesis, a formal verification method is proposed for the scheduling verification in HLS. This method is capable of BB-based scheduling and PBS. It is also well suited to verify some popular code transformation techniques: DD, DU, MD, MU, UM, Sp, RSp, SSp, and CSE. But it still not supports some code transformation techniques, such as loop invariant and copy propagation.

Fig. 21 shows an example of loop invariant. Since all variables of the statement “c⇐x+y”

are not modified in the loop (qβi1 → qβi2  qβi1), loop invariant technique moves

“c⇐x+y” out from the loop. Our algorithm fails to this situation. Since the path ending at qβi1 and the path ending at qαj1 are not equivalent, our algorithm needs to extend the path. After the path extension, i.e. removing the cutpoints qβi1 and qβi2, it becomes a walk which is not a path. Therefore, our algorithm fails.

Fig. 21 An Example of Loop Invariant

Copy propagation is derived from the compiler. The statements in “g⇐h” form are called copy statements where “g” and “h” are variables. Copy propagation replaces “g” with “h” in all the statements that have flow dependencies with “g⇐h”. It is illustrated in Fig. 22. Since

“a⇐x+y” and “c⇐x+y” have the common subexpression, CSE replaces the expression of

“c⇐x+y” with “a”. Then, copy propagation finds “d⇐c” having data dependence with

“c⇐a+b” and it replaces “c” with “a”. Our approach fails to this case. Since start from the CSP “(qβi2,qαj2)”, the final values of “d ” of “qβi2 → qβi3” and “qαi2 → qαi3” are not equivalent because “c ≠ a”.

Fig. 22 An Example of Copy Propagation

References

[1] D. D. Gajski, N.D. Dutt, A.C.-H. Wu, and S.Y.-L. Lin, “High-Level Synthesis:

Introduction to Chip and System Design,” Kluwer, 1992.

[2] A. A. Jerraya, H. Ding, P. Kission, and M. Rahmouni, “Behavioral Synthesis and Component Reuse with VHDL,” Kluwer, 1997.

[3] P. Coussy, D.D. Gajski, M. Meredith, and A. Takach, “An Introduction to High-Level Synthesis,” IEEE Design & Test of Computers, vol. 26, page 8-17, July-Aug. 2009.

[4] C. Y. Hitchcock and D.E. Thomas, “A Method of Automatic Data Path Synthesis,”

Design Automation Conference, page 484-489, Jun. 1983.

[5] B. M. Pangrle and D.D. Gajski, “Slicer: A State Synthesizer for Intelligent Silicon Compilation,” IEEE International Conference Computer Design: VLSI in Computers &

Processors, Oct. 1986.

[6] P. G. Paulin and J.P. Knight, “Force-directed Scheduling for the Behavioral Synthesis of ASIC’s,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 8, page 661-679, Jun. 1989.

[7] R. Camposano, “Path-based Scheduling for Synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 10, page 85-93, Jan.

1991.

[8] G. Lakshminarayana, A. Raghunathan, and N.K. Jha, “Incorporating Speculative Execution into Scheduling of Control-flow Intensive Behavioral Descriptions,” Design Automation Conference, page 108-113, Jun. 1998.

[9] L.C.V. dos Santos and J.A.G. Jess, “A Reordering Technique for Efficient Code Motion,”

Design Automation Conference, page 296-299, Jun. 1999.

[10] M. Rim, Y. Fann, and R. Jain, “Global Scheduling with Code-motions for High-level Synthesis Applications,” IEEE Transactions on VLSI Systems, vol. 3, page 379-392,

Sept. 1995.

[11] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Dynamic Conditional Branch Balancing during the High-level Synthesis of Control-intensive Designs,” Design, Automation and Test in Europe Conference and Exhibition, vol. 1, page 270-275, Dec. 2003.

[12] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Spark: A High-level Synthesis Framework for Applying Parallelizing Compiler Transformations,” International Conference on VLSI Design, page 461-466, Jan. 2003.

[13] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Loop Shifting and Compaction for the High-level Synthesis of Designs with Complex Control Flow,” Design, Automation and Test in Europe Conference and Exhibition, vol. 1, page 114-119, Feb. 2004.

[14] S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Using Global Code Motions to Improve the Quality of Results for High-level Synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, page 302-312, Feb. 2004.

[15] S. Gupta, R. Gupta, N. Dutt, and A. Nicolau, “Spark: A Parallelizing Approach to the High-level Synthesis of Digital Circuits,” Kluwer, 2004.

[16] R. Ernst and J. Bhasker, “Simulation-based Verification for High-Level Synthesis Applications,” IEEE Design & Test of Computers, vol.8, page 14-20, Mar. 1991.

[17] R. A. Bergamaschi and S. Raje, “Observable Time Windows: Verifying High-Level Synthesis Results,” IEEE Design & Test of Computers, vol. 14, page 40-50, April-Jun.

1997.

[18] T.-H. Chiang and L.-R. Dung, “Verification Method of Dataflow Algorithms in High-Level Synthesis,” Journal of Systems and Software, vol. 80, page 1256-1270, Aug.

2007.

[19] N. Narasimhan, E. Teica, R. Radhakrishnan, S. Govindarajan, and R. Vemuri, “Theorem Proving Guided Development of Formal Assertions in a Resource-Constrained Scheduler for High-Level Synthesis,” Formal Methods in System Design, vol. 19, page 237-273,

Nov. 2001.

[20] R. Radhakrishnan, E. Teica, and R. Vemuri, “An Approach to High-Level Synthesis Validation using Formally Verified Transformations,” IEEE International High-Level Design Validation and Test Workshop, page 80-85, August-Oct. 2000.

[21] R. Radhakrishnan, E. Teica, and R. Vemuri, “Verification of Basic Block Schedules using RTL Transformations,” Lecture Notes in Computer Science, vol. 2144, page 173-178, Jan. 2001.

[22] N. Mansouri and R. Vemuri, “A Methodology for Automated Verification of Synthesized RTL Designs and Its Integration with a High-Level Synthesis Tool,” Lecture Notes in Computer Science, vol. 1522, page 204-221, Jan. 1998.

[23] Y. Kim , S. Kopuri, and N. Mansouri, “Automated Formal Verification of Scheduling Process using Finite State Machines with Datapath (FSMD),” International Symposium on Quality Electronic Design, page 110-115, Aug. 2004.

[24] H. Eveking, H. Hinrichsen, and G. Ritter, “Automatic Verification of Scheduling Results in High-Level Synthesis,” Design, Automation and Test in Europe Conference and Exhibition, page 59-64, Mar. 1999.

[25] Y. Kim and N. Mansouri, “Automated Formal Verification of Scheduling with Speculative Code Motions,” Great Lakes symposium on VLSI, page 95-100, 2008.

[26] C. Karfa, D. Sarkar, C. Mandal, and P. Kumar, “An Equivalence-Checking Method for Scheduling Verification in High-Level Synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, page 556-569, Mar.

2008.

[27] S. Owre, J. M. Rushby, and N. Shankar, “PVS: A Prototype Verification System,”

Lecture Notes in Computer Science, vol. 607, page 748-752, Jun. 1992.

[28] P. G. Paulin and J. P. Knight, “Scheduling and Binding Algorithms for High-Level Synthesis,” Design Automation Conference, page 1-6, Jun. 1989.

[29] E. Teica and R. Vemuri, “A Mechanical Proof of Completeness for a Set of Register-Level Transformation,” Technical Report 257/05/01/ECECS, University of Cincnnati, 2001.

[30] Z. Manna, “Mathematical Theory of Computation,” McGraw-Hill, 1974.

[31] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein,

“Introduction to Algorithms,” McGraw-Hill, 2001.

[32] P. R. Panda and N. D. Dutt, “1995 high level synthesis design repository,” International Symposium on System Synthesis, page 170-174, Sept. 1995.

[33] http://computing.ece.vt.edu/~mhsiao/hlsyn.html

相關文件