利用可滿足性求解法與克雷格內插法之大尺度亞氏函式拆解

(1)

國立臺灣大學電機資訊學院電子工程學研究所碩士論文

Graduate Institute of Electronics Engineering College of Electrical Engineering and Computer Science

National Taiwan University Master Thesis

利用可滿足性求解法與克雷格內插法之大尺度亞氏函式拆解

Large Scale Ashenhurst Decomposition via SAT Solving and Craig Interpolation

林炫伯 Hsuan-Po Lin

指導教授：江介宏博士

Advisor: Jie-Hong Roland Jiang, Ph.D.

中華民國 98 年 6 月

June, 2009

(2)

Acknowledgements

I am grateful to many people who made this thesis possible. I would like to thank my advisor, Dr. Jie-Hong Roland Jiang, with his enthusiasm, his academic experiences, and his inspiration. Throughout the past two years, he provides encouragements, useful advise, and lots of good ideas. I would have lost without him.

The members of ALCom Lab, I-Hsin Chen, Wei-Lun Hung, Sz-Cheng Huang, Chia-Chao Kan, Ruei-Rung Lee, Chih-Fan Lai, Meng-Yan Li, Jane-Chi Lin, and Fu-Rong Wu, have been my great memory in academic journey. We had a lot of inspiring discussions and joyful cooperations. I would also like to thank all of my friends who helped me get through difficult time and provided invaluable supports.

I cannot end this acknowledgement without thanking my parents and brother for their constant supports and encouragements no matter what happened. To them I dedicate this thesis.

Hsuan-Po Lin

National Taiwan University June 2009

(3)

利用可滿足性求解法與克雷格內插法之大尺度亞氏函式拆解

研究生：林炫伯指導教授：江介宏博士

國立台灣大學電子工程學研究所

摘要

函式拆解著眼於將一個布林函式拆解成一系列較小的子函式。在本篇論文裡面，

我們著重在亞氏函式拆解，這是一種因為他的簡易性而有許多實際應用的常見函式拆解法。我們將函式拆解問題包裝成可滿足性求解問題，更進一步的採用克雷格內插法以及函式相依性計算來找出相對應的子函式。在我們採用可滿足性求解法為核心的研究中，輸入變數分組的過程可以被自動的處理，並且嵌入我們的函式拆解演算法中。我們也可以自然的將我們的演算法延伸，應用在允許共用輸入變數與多輸出變數的函式拆解問題上，這些問題在以往採用二元決策圖為核心資料結構的演算法中都很難被解決。實驗結果顯示，我們提出的演算法可以有效的處理輸入變數達到三百個之多的函式。

關鍵字：布林函式、函式拆解、可滿足性求解法、克雷格內插法、函式相依性

(4)

Large Scale Ashenhurst Decomposition via SAT Solving and Craig Interpolation

¹

Student: Hsuan-Po Lin Advisor: Dr. Jie-Hong Roland Jiang

Graduate Institute of Electronics Engineering National Taiwan University

Abstract

Functional decomposition aims at decomposing a Boolean function into a set of smaller sub-functions. In this thesis, we focus on Ashenhurst decomposition, which has practical applications due to its simplicity. We formulate the decomposition problem as SAT solving, and further apply Craig interpolation and functional dependency computation to derive composite functions. In our pure SAT-based solution, variable partitioning can be automated and integrated into the decomposition procedure. Also we can easily extend our method to non-disjoint and multiple-output decompositions which are hard to handle using BDD-based algorithms. Experi- mental results show the scalability of our proposed method, which can effectively decompose functions with up to 300 input variables.

Keywords: Boolean function, functional decomposition, SAT solving, Craig interpolation, functional dependency.

1The preliminary version of this thesis appears in [18].

(5)

List of Figures

1.1 Ashenhurst decomposition . . . 6

2.1 The diagonal decomposition chart due to introduce of common variable 14 2.2 Circuit to CNF representation . . . 17 2.3 Refutation proof and the unsatisfiable core . . . 22 2.4 Craig interpolation . . . 23 2.5 Interpolant construction of an unsatisfiable formula ϕ = (¬c)(¬b +

a)(c + ¬a)(b) . . . 26

3.1 Circuit representation of Formulas (3.2) and (3.3) . . . 30 3.2 (a) Relation characterized by ψ_A(X_G¹, X_G², c) for some c ∈ [[X_C]] (b)

Relation after cofactoring ψ_A(X_G¹ = a, X_G², c) with respect to some a ∈ [[X_G¹]] . . . 30 3.3 Example circuit used in the complete example . . . 50 3.4 The decomposition chart of the example circuit . . . 51 3.5 (a) Relation of the derived interpolant function, (b) Relation after

cofactor by (a¹, b¹) = (0, 0) . . . 52

(9)

List of Figures

3.6 (a) The base functions, (b) h function derived by functional dependency 53

4.1 Best variable partition found in 60 seconds – without minimal UNSAT core refinement . . . 58 4.2 Best variable partition found in 60 seconds – with minimal UNSAT

core refinement . . . 58 4.3 Variable partition qualities under four different efforts . . . 60 4.4 First-found valid partition – without minimal UNSAT core refinement 61 4.5 First-found valid partition – with minimal UNSAT core refinement . . 62 4.6 Comparison of disjointness between different partition efforts . . . 65 4.7 Comparison of disjointness between different partition efforts . . . 66 4.8 Disjointness versus total variables of the best found partition in 60

seconds with minimal UNSAT core operation . . . 66 4.9 Runtime of multi-output Ashenhurst decomposition . . . 68

(10)

List of Tables

4.1 Single-output Ashenhurst decomposition . . . 55

4.2 Two-output Ashenhurst decomposition . . . 56

4.3 Variable distribution in different partition efforts . . . 64

4.4 Circuit information of s9234 . . . 69

(11)

Chapter 1 Introduction

1.1 Thesis Overview

Functional decomposition [1, 7, 14, 22] plays an important role in the analysis and design digital systems. It refers to the process of breaking a complex function into parts, which are less complex than the original function and are relatively independent to each other. In addition the behavior of the original function can be recon- structed if we compose these parts together. In logic synthesis, a complex function is decomposed into a set of sub-functions, such that each sub-function is easier to analyze, to understand, and to further synthesize. Functional decomposition has long been recognized as a pivotal role in LUT-based FPGA synthesis. It also has various applications to the minimization of circuit communication complexity.

(12)

1.1. Thesis Overview

Functional decomposition can be classified as follows:

• The function to be decomposed can be a completely or an incompletely specified function.

• The function to be decomposed can be a single- or multiple-output function.

• A decomposition is disjoint if the sub-functions do not share common input variables; otherwise, it is non-disjoint.

• A decomposition is called Ashenhurst decomposition or simple decomposition if the topology of the decomposition is f (X) = h(g(X_G), X_H), where X = X_G ∪ X_H and g is a single-output function; a decomposition is called bi- decomposition if the topology is f (X) = h(g₁(X_A), g₂(X_B)), where X = X_A∪ X_B and h is a two-input gate.

In this thesis, our method focuses on completely specified functions. In addition the proposed method deals with multiple-output and non-disjoint decompositions.

Furthermore, we pay our attention to Ashenhurst decomposition.

The functional decomposition problem was first formulated by Ashenhurst in 1959 [1]. He visualized the decomposition feasibility with a decomposition chart, which is a two-dimensional Karnaugh map with rows corresponding to variables in the free set and columns corresponding to variables in the bound set. The decomposition chart is used to determine whether a given function f can be simply disjointly

(13)

decomposed with respect to a set of given bound set variables. Ashenhurst reduced the chart by merging all identical columns, and he showed that there exists a simple disjoint decomposition if and only if there are at most two distinct columns in the reduced decomposition chart. The disadvantage of Ashenhurst’s method is that we have to construct every chart with respect to every variable partition we consider. If we consider all the variable partitions, a function with n variables would have O(2ⁿ) charts since every variable can be in either the bound set or free set.

Roth and Karp proposed a cover-based method [22], trying to reduce the memory requirement of Ashenhurst’s method. They used covers to represent the functions.

By cover manipulations, the minterms of bound set variables will be mapped into equivalence classes. These equivalence classes are in one-to-one correspondence with the distinct columns in Ashenhurst’s method. Nevertheless, the process time of the algorithm is still a problem except we restrict the size of the bound set variables to be up-bounded by some constant k. A typical value of k is 5 which is the common input size of a LUT.

Recent progress on function manipulation using BDDs makes the BDD data structure becomes a popular tool to handle the functional decomposition problem.

Lai, Pedram, and Vrudhula [15] proposed a fast BDD-based method to implement the Roth-Karp algorithm. They used BDDs to represent Boolean functions, and showed that every variable ordering in the BDD implies a variable partition of the decomposition of the function. They ordered all the bound set variables above

(14)

the cut, and ordering all the free set variables in and below the cut. Based on Lai’s method, Stanion and Sechen proposed a method [25] to enumerate all the possible cuts in the BDD in order to find a good decomposition. The enumeration can be achieved by constructing a characteristic function for the set of cuts, then using a branch-and-bound procedure in order to find a cut which produces the best decomposition.

Identifying the common sub-functions in decomposing a function set is an important issue in multiple-output functional decomposition. The first approach to multiple-output functional decomposition was proposed by Karp [14]. However Karp’s multiple-output approach works only for two-output functions. In order to overcome the limitation of Karp’s method, researchers proposed several ways to handle the multiple-output decomposition problem. Wurth, Eckl, and Antreich presented an algorithm [29] based on the concept of a preferable decomposition function, which is a decomposition function valid for a single output and with the potential to be shared by other output functions to handle multiple-output decomposition.

They showed that the construction of all preferable decomposition functions can be achieved by partitioning the minterms of bound set variables into some global classes, and this information can be used to identify the common preferable decomposition functions of a multiple-output function.

Sawada, Suyama, and Nagoya proposed a BDD-based algorithm [24] to deal with the issue of identifying common sub-functions. They proposed a effective Boolean

(15)

resubstitution technique based on support minimization to effectively identify the common LUTs from a large amount of candidates. However when the size of the support variables is large, the examination becomes time-consuming and sometimes fails due to memory explosion.

Most prior work on functional decomposition used BDD as the underlying data structure. BDD can be used to handle the functional decomposition problem with proper variable ordering, which the size of the BDD is under an acceptable threshold.

However in some cases, the BDD-based algorithm has several limitations:

• Firstly, there are memory explosion problems for BDD-based algorithms. BDD can be of large size in representing a Boolean function. In the worst case, BDD size is exponential to the number of variables. When special variable ordering rules need to be imposed on BDDs for functional decomposition, the memory size needed in BDD computation may not be improved by changing a suitable variable ordering. Therefore it is typical that a function under decomposition using BDD as the underlying data structure can have just a few variables.

• Secondly, variable partitioning needs to be specified a priori, and cannot be automated as an integrated part of the decomposition process. Decomposability under different variable partitions need to be analyzed case by case. In order to enumerate different variable partitions effectively and keep the size of BDD reasonably small, the set of bound set variables cannot be large. Otherwise, the computation time will easily over the pre-specified threshold.

(16)

• Thirdly, for BDD-based approaches, non-disjoint decomposition cannot be handled easily. In practical, decomposability needs to be analyzed by cases exponential to the number of joint (or common) variables.

• Finally, even though multiple-output decomposition [30] can be converted to single-output decomposition [12], BDD sizes may grow largely in this conversion.

Figure 1.1: Ashenhurst decomposition

The above limitations motivate the need for new data structures and computation methods for functional decomposition. This thesis shows that, when Ashen- hurst decomposition [1] is considered, these limitations can be overcome through satisfiability (SAT) based formulation. Ashenhurst decomposition is a special case of functional decomposition, where, as illustrated in Figure 1.1, a function f (X) is decomposed into two sub-functions h(X_H, X_C, x_g) and g(X_G, X_C) with f (X) = h(X_H, X_C, g(X_G, X_C)). For general functional decomposition, the function g can be

(17)

1.2. Related Work

a functional vector (g₁, . . . , g_k) instead. It is the simplicity that makes Ashenhurst decomposition particularly attractive in practical applications.

The techniques we used in this thesis, in addition to SAT solving, include Craig interpolation [6] and functional dependency [16]. Specifically, the decomposability of function f is formulated as SAT solving, the derivation of function g is by Craig interpolation, and the derivation of function h is by functional dependency.

1.2 Related Work

Aside from the related prior work using BDD as underlying data structure, we com- pare some related work on functional decomposition and Boolean matching using SAT-based techniques. In bi-decomposition [17], a function f is decomposed into f (X) = h(g₁(X_A, X_C), g₂(X_B, X_C)) under variable partition X = {X_A|X_B|X_C}, where function h is known as a priori and is fixed to be special function types, which can be two-input or, and, xor gates, etc.. The functions g1 and g₂ are the unknowns to be computed. Compared with bi-decomposition, Ashenhurst decomposition f (X) = h(X_H, X_C, g(X_G, X_C)) focuses on both unknown functions h and g, where no any fixed type of gates are specified. The Ashenhurst decomposition problem needs be formulated and solved differently while the basic technique used in our thesis is similar to that in [17].

FPGA Boolean matching, see, e.g., [3], is a subject closely related to functional

(18)

1.3. Our Contributions

decomposition. In [19], Boolean matching was achieved with SAT solving, where quantified Boolean formulas were converted into CNF formulas. In order to eliminate the universal quantifiers, the CNF formulas will be duplicated exponential times related to the input variable sizes. The intrinsic exponential explosion in formula sizes limits the scalability of the approach. Typically, this method cannot handle functions with input variables greater than 10. To overcome this problem, a two- stage SAT-based Boolean matching algorithm [26] and an implicant-based method [4] were proposed to improve the limitation of methods mentioned in [19]. Different from the above algorithms mainly focus on completely specified functions, Wang and Chan proposed a SAT-based Boolean matching method [28] to handle functions with don’t-cares. On the other hand, our method may provide a partial solution to the Boolean matching problem, at least for some special PLB configurations similar to the topology of Ashenhurst decomposition.

1.3 Our Contributions

Compared with BDD-based methods, the proposed SAT-based algorithm is advan- tageous in the following aspects.

• Firstly, it does not suffer from the memory explosion problem and is scalable to large functions. Experimental results show that Boolean functions with more than 300 input variables can be decomposed effectively.

(19)

1.3. Our Contributions

• Secondly, it needs not be specified a priori when variable partitioning, and can be automated and derived on the fly during decomposition. Hence the size of the bound set variables X_G needs not be small. Bound set variables X_G can be as large as free set variables X_H to obtain a more balanced variable partition.

• Thirdly, it works for non-disjoint decomposition naturally. The automated variable partition process in our method can indeed generate a non-disjoint variable partition. In practical, we wish the number of the common (non- disjoint) variables X_C to be small, thus we further propose an UNSAT core refinement process to heuristically reduce the number of common variables as much as possible.

• Finally, it can be easily extended to multiple-output Ashenhurst decomposition if we assume that all of the output functions f_i share the same variable partition X = {XH|XG|XC}.

However when generalizing our method to functional decomposition beyond Ashenhurst’s special case, both SAT-based formula size and computation time of SAT solving grow. This is the limitation of our proposed method.

As interconnects become a dominating concern in modern nanometer IC designs, scalable decomposition methods play a pivotal role in circuit communication minimization. While functional decomposition breaks the original function into smaller

(20)

1.4. Thesis Organization

and relatively independent sub-functions, the communication complexity between these sub-functions are greatly reduced proportional to the number of interconnect- ing wires between these parts.

With the advantages of the proposed method, hierarchical logic decomposition could be made feasible in practice. In addition, our results may provide a new way on scalable Boolean matching for heterogeneous FPGAs as well as topologically constrained logic synthesis.

1.4 Thesis Organization

This thesis is organized as follows. Chapter 2 introduces essential preliminaries.

Our main algorithms are presented in Chapter 3, and evaluated with experimental results in Chapter 4. Finally, Chapter 5 concludes the thesis and outlines future work.

(21)

Chapter 2 Preliminaries

In this chapter, we provide the necessary and sufficient background needed in this thesis. As conventional notation, in this thesis, sets are denoted in upper-case letters, e.g., S; set elements are in lower-case letters, e.g., e ∈ S. The cardinality of S is denoted as |S|. A partition of a set S into S_i ⊆ S for i = 1, . . . , k (with Si ∩ Sj = ∅, i 6= j, and S

iSi = S) is denoted as {S1|S2| . . . |Sk}. For a set X of Boolean variables, its set of valuations (or truth assignments) is denoted as [[X]], e.g., [[X]] = {(0, 0), (0, 1), (1, 0), (1, 1)} for X = {x₁, x₂}.

(22)

2.1. Functional Decomposition

2.1 Functional Decomposition

Definition 2.1 Given a completely specified Boolean function f , variable x is a support variable of f if f_x 6= f¬x, where f_x and f¬x are the positive and negative cofactors of f on x, respectively.

Definition 2.2 A set {f₁(X), . . . , f_m(X)} of completely specified Boolean functions is (jointly) decomposable with respect to some variable partition X = {X_H|X_G|X_C} if every function f_i, i = 1, . . . , m, can be written as

f_i(X) = h_i(X_H, X_C, g₁(X_G, X_C), . . . , g_k(X_G, X_C))

for some functions hi, g1, . . . , gk with k < |XG|. The decomposition is called disjoint if X_C = ∅, and non-disjoint otherwise.

For m = 1, it is known as single-output decomposition, and for m > 1, multiple- output decomposition. Note that h₁, . . . , h_m share the same functions g₁, . . . , g_kif we are dealing with multiple-output decomposition problem. The so-called Ashenhurst decomposition [1] is for k = 1, which g function has only one output bit.

Note that, for |X_G| = 1, there is no successful decomposition because of the violation of the criterion k < |X_G|. On the other hand, the decomposition trivially holds if X_C ∪ X_G or X_C ∪ X_H equals X. The corresponding variable partition is called trivial. This paper is concerned about decomposition under non-trivial variable partition. Moreover, we focus on Ashenhurst decomposition.

(23)

The decomposability of a set {f₁, . . . , f_m} of functions under the variable partition X = {X_H|X_G|X_C} can be analyzed through the so-called decomposition chart, consisting of a set of matrices, one for each member of [[X_C]]. The rows and columns of a matrix are indexed by {1, . . . , m} × [[X_H]] and [[X_G]], respectively.

For i ∈ {1, . . . , m}, a ∈ [[X_H]], b ∈ [[X_G]], and c ∈ [[X_C]], the entry with row index (i, a) and column index b of the matrix of c is of value f_i(X_H = a, X_G= b, X_C = c).

Proposition 2.1 [1,7,14] A set {f₁, . . . , f_m} of Boolean functions is decomposable as

f_i(X) = h_i(X_H, X_C, g₁(X_G, X_C), . . . , g_k(X_G, X_C))

for i = 1, . . . , m under variable partition X = {X_H|X_G|X_C} if and only if, for every c ∈ [[X_C]], the corresponding matrix of c has at most 2^k column patterns (i.e., at most 2^k different kinds of column vectors).

2.1.1 Decomposition Chart

Given a variable partition X = {X1|X2}, we want to check if a decomposition f = h(g(X₁), X₂) exists. To do this, we build a table called decomposition chart.

Decomposition chart is a table rearrange from the K-map of a given function f . The table has 2^|X¹^|columns correspond to input vectors of X1 and 2^|X²^| rows correspond to input vectors of X₂. Each entry in the decomposition chart represents a function value whose input value is a combination of the index of corresponding rows and columns. Moreover, if we allow some variables can be common in both the row and

(24)

column, the resulting decomposition chart will be formed by some sub-chart in a diagonal way.

In our Ashenhurst decomposition, the corresponding decomposition chart of the given function f uses input variables of g as its column index variable, and input variables of h as its row index variables. The whole decomposition chart of f is composed by some sub-charts in diagonal way. Each sub-chart corresponds to one valuation of X_C. Hence if there are k common variables, the number of sub-chart will be 2^k. For example if we have one variable in X_C, one variable in X_G, and one variable in X_H. The corresponding decomposition chart is shown in Figure 2.1.

Figure 2.1: The diagonal decomposition chart due to introduce of common variable

(25)

2.2. Functional Dependency

2.2 Functional Dependency

Definition 2.3 Given a Boolean function f : B^m → B and a vector of Boolean functions G = (g₁(X), . . . , g_n(X)) with g_i : B^m → B for i = 1, . . . , n, over the same set of variable vector X = (x₁, . . . , x_m), we say that f functionally depends on G if there exists a Boolean function h : Bⁿ → B, called the dependency function, such that f (X) = h(g₁(X), . . . , g_n(X)). We call functions f , G, and h the target function, base functions, and dependency function, respectively.

Note that functions f and G are over the same domain in the definition; h needs not depend on all of the functions in G.

The necessary and sufficient condition of the existence of the dependency function h is given as follows.

Proposition 2.2 [11] Given a target function f and base functions G, let h⁰ = {a ∈ Bⁿ : a = G(b) and f (b) = 0, b ∈ B^m} and h¹ = {a ∈ Bⁿ : a = G(b) and f (b) = 1, b ∈ B^m}. Then h is a feasible dependency function if and only if {h⁰ ∩ h¹} is empty. In this case, h⁰, h¹, and Bⁿ\{h⁰∪ h¹} are the off-set, on-set, and don’t-care set of h, respectively.

By Proposition 2.2, one can not only determine the existence of a dependency function, but also deduce a feasible one. A SAT-based formulation of functional dependency was given in [16]. It forms an important ingredient in part of our computation.

(26)

2.3. Satisfiability and Interpolation

2.3 Satisfiability and Interpolation

2.3.1 Conjunctive Normal Form

A formula is said to be in Conjunctive Normal Form (CNF) if it is a conjunction of clauses. Each clause is a disjunction of one or more literals, where a literal is the occurrence of a variable x or its complement ¬x. Without loss of generality, we shall assume that there are no any equivalent or complementary literals in one clause. For example, the formula (x + ¬y)(¬x + y) is a CNF representation of the equality function for variable x and y. The formula has 2 clauses, and each clause has 2 literals.

The CNF is similar to Product of Sum (POS) representation in the circuit theory.

It is one of the major contributing factor for the recent success of the Boolean Satisfiability (SAT) problem. The CNF representation of a SAT problem provides a data structure for efficient implementation of various techniques used in most popular SAT solvers.

2.3.2 Circuit to CNF Conversion

The CNF formula of a combinational circuit can be constructed in linear time by introducing some intermediate variables. The CNF formula of a combinational circuit is the conjunction of the CNF formulas for each gate output, where the CNF

(27)

formula for each gate denotes the valid input and output assignments of the gate.

The detail information for converting a circuit to CNF representation can be found in [27].

ϕ = (a + ¬c)(b + ¬c)(¬a + ¬b + c) (c + ¬e)(d + ¬e)(¬c + ¬d + e)

(a) Consistent assignment

ϕ⁰ = (a + ¬c)(b + ¬c)(¬a + ¬b + c) (c + ¬e)(d + ¬e)(¬c + ¬d + e)(¬e)

(b) With property e = 0

Figure 2.2: Circuit to CNF representation

Figure 2.2 shows an example of a simple circuit with the corresponding CNF formula indicating the truth assignment and the CNF formula with some given properties. As can be seen, the CNF formula can be divided into two parts, each part is a set of clauses for one particular gate. Two gates share the same interme-

(28)

diate variable c. So the CNF representation of the circuit is the conjunction of the sets of clauses for each particular gate. Hence, given a combinational circuit it is straightforward to construct the CNF formula for the circuit as well as the CNF formula for proving a given property of a the circuit.

2.3.3 Propositional Satisfiability

In the beginning of this subsection, we firstly define some terminologies of propositional satisfiability problem in the scope of modern SAT solvers. Let V = {v₁, . . . , v_k} be a finite set of Boolean variables, where each v_i ∈ B = {0, 1}. A SAT instance is a set of Boolean formulas in CNF. A assignment over V gives each variable v_i ∈ V a Boolean value either True(1) or False(0). A SAT instance is said to be satisfiable is there exists an assignment over V such that the CNF formula is evaluated as True.

More specifically, each clause of the CNF formula is evaluated as True. Otherwise, it is unsatisfiable. A SAT problem is a decision problem asked the given SAT instance is satisfiable or not. A SAT solver is designed to answer the SAT problem.

SAT was the first known NP-complete problem, as proved by Stephen Cook in 1971 [5]. The problem remains NP-complete even if the SAT instance is written in conjunctive normal form with 3 variables per clause, yielding the 3SAT problem.

Most of the popular modern SAT solvers use David-Putnam-Logemann-Loveland (DPLL) [8, 9] procedure as the basic algorithm to solve the SAT problem. DPLL is tested to solve large propositional satisfiable problem efficiently.

(29)

The basic idea of DPLL procedure in solving a SAT problem is to branch on variables V until a conflict arises or a satisfying assignment is derived. Once a conflict arises, DPLL chooses another branch to keep testing the satisfiability. If the variable value on a certain branch results in a satisfying assignment over the SAT problem, the DPLL procedure stops the rest branching step immediately, then returns the answer satisfiable as well as the satisfying assignment. Otherwise, DPLL procedure needs to test all the branches to report the unsatisfiability. In this thesis, we use MiniSat [10] developed by Niklas E´en and Niklas S¨orensson as the underlying SAT solver in the experiments.

2.3.4 Refutation Proof

Some of the modern SAT solvers generate a refutation proof to demonstrate the unsatisfiability of a SAT instance. Refutation proof is a series of resolution steps show the unsatisfiable SAT instance will eventually imply an empty clause. Each step in these series of steps called resolution. The detailed definition of resolution and other terminology are shown in this subsection.

Definition 2.4 Let (v ∨ c₁) and (¬v ∨ c₂) are two clauses. Where v is a Boolean variable, v and ¬v are literals, c₁ and c₂ are disjunction of other literals different from v and ¬v. The resolution of these two clauses on variable v is a new clause (c₁∨ c₂). The variable v here is called a pivot variable, and (c₁∨ c₂) is resolvent. The resolvent exists when only one pivot variable exists. In other word, the resolvent

(30)

can not be a tautological clause.

For example, clauses (a ∨ ¬b) and (¬a ∨ b) on variable a has no resolvent since clause (¬b ∨ b) is tautological. Also, resolution on variable v over two given clauses is similar to existential quantification on variable v. That is, ∃v.(v ∨ c₁) ∧ (¬v ∨ c₂) = (c1∨ c2).

Proposition 2.3 The conjunction of (v ∨ c₁) and (¬v ∨ c₂) implies its resolvent (c₁∨ c₂).

(v ∨ c₁) ∧ (¬v ∨ c₂) ⇒ (c₁∨ c₂)

Theorem 2.1 For an unsatisfiable SAT instance, there exists a resolution refutation steps leads to an empty clause.

Proof. Since every unsatisfiable SAT instance must imply a contradiction that is an empty clause. Hence, theorem 2.1 can be proved directly by Proposition 2.3.

There must exists some resolution steps leads to an empty clause. 2

Often, only a subset of clauses of an unsatisfiable SAT instance participate in the resolution steps, which lead to an empty clause. This subset of clauses of a unsatisfiable SAT instance is called unsatisfiable core, the detailed definition of unsatisfiable core will be described later.

Definition 2.5 A refutation proof of an unsatisfiable formula ϕ is a directed acyclic graph (V_ϕ, E_ϕ). Where V_ϕ is a set of clauses, and the SAT instance of formula ϕ is

(31)

a clause set C, such that

• for each vertex k ∈ Vϕ, either

– k ∈ C, and k is a root, or

– k has exactly two predecessors, k₁ and k₂, and k is the resolvent of k₁ and k₂, and

• there exists an empty clause be the unique leaf of (V_ϕ, E_ϕ).

Theorem 2.2 If there is a refutation proof of unsatisfiability for clause set C, C must be unsatisfiable.

Definition 2.6 For an unsatisfiable SAT instance of formula ϕ, the unsatisfiable core of the formula is the subset of clauses whose conjunction is still unsatisfiable.

The unsatisfiable core is called minimal if all the subsets of it are satisfiable. The unsatisfiable core is called minimum if it contains the smallest number of the original clauses to be still unsatisfiable.

The intuition behind the unsatisfiable core is that, these subset of the original clauses are sufficient causing whole CNF formula to be constant F alse. The assignment of variables not in the unsatisfiable core will not effect the unsatisfiability of the formula. Note that the unsatisfiable core of a unsatisfiable formula is not unique, there are one or many different unsatisfiable cores of a particular unsatisfiable formula. In the point of view of a resolution refutation proof, the simplest

(32)

unsatisfiable core is the subset of root clauses that has a path to the unique empty leaf clause.

Figure 2.3 illustrates the resolution refutation proof, and one of the simplest unsatisfiable core of the unsatisfiable formula ϕ = (¬c)(¬b+a)(c+¬a)(b)(d+e)(¬d).

Figure 2.3: Refutation proof and the unsatisfiable core

2.3.5 Craig interpolation

Theorem 2.3 (Craig Interpolation Theorem) [6]

Given two Boolean formulas ϕ_A and ϕ_B, with ϕ_A ∧ ϕ_B unsatisfiable, then there exists a Boolean formula ψ_A satisfy the 3 properties that

(33)

• ψ_A referring only to the common variables of ϕ_A and ϕ_B

• ϕA⇒ ψA

• ψ_A∧ ϕ_B is unsatisfiable.

The Boolean formula ψ_A is referred to as the interpolant of ϕ_A with respect to ϕ_B. Some modern SAT solvers, e.g., MiniSat [10], are capable of constructing an interpolant from an unsatisfiable SAT instance. Figure 2.4 illustrates the solution space of ϕ_A, ϕ_B and the interpolant ψ_A. Note that the smallest interpolant is ϕ_A itself and the largest interpolant is ¬ϕB.

Figure 2.4: Craig interpolation

There are many researches [20, 21] show that we can build an interpolant from the resolution refutation proof of an unsatisfiable SAT instance in linear time to the

(34)

proof size itself. In this thesis, we use the method described in [20] as our underlying way to get an interpolant.

Suppose we are given a pair of clause sets (A, B) derived from the formula ϕ and a refutation proof of unsatisfiability (V_ϕ, E_ϕ) of A ∪ B. For the clause sets (A, B), a variable is said to be global if it appears in both A and B, and local to A if it appears only in A. Other variables are considered irrelevant in the interpolant construction procedure. Similarly, a literal is global or local depending on the variable it contains.

Moreover, given any clause c, we use g(c) to denote the disjunction of the global literals in c.

The linear time algorithm for interpolant construction mentioned in [20] is in the following rules.

• Let f (c) be a boolean formula for all vertices c ∈ V_ϕ

• if c is a root, then

– if c ∈ A then f (c) = g(c)

– if c ∈ B then f (c) is the constant T rue

• if c is the intermediate vertex, let c₁ and c₂ be the predecessors of c, and v be their pivot variable.

– if v is local to A, then f (c) = f (c₁) ∨ f (c₁) – else, f (c) = f (c₁) ∧ f (c₁)

(35)

Since every unsatisfiable SAT instance implies the constant F alse, so the last step of the resolution refutation proof resolve a constant F alse. Hence, f (F alse) is a boolean function constructed from the above rules. Also, it is the interpolant of the unsatisfiable clause sets A ∪ B and the corresponding refutation proof (V_ϕ, E_ϕ).

Figure 2.5 illustrates the resolution refutation proof and detailed construction step of an unsatisfiable formula ϕ = (¬c)(¬b + a)(c + ¬a)(b).

Note that the interpolant construction can be done in O(N + L) through the topology of the refutation proof, where N is the number of vertices in Vϕ and L is the total literal number in the resolution refutation proof (V_ϕ, E_ϕ). We need the complexity of L since each time a resolvent is going to be generated, we have to scan all of the literals in the predecessor clauses to find the pivot variable.

(36)

A = (¬b + a)(c + ¬a) B = (b)(¬c)

local = a global = b, c

Figure 2.5: Interpolant construction of an unsatisfiable formula ϕ = (¬c)(¬b+a)(c+

¬a)(b)

(37)

Chapter 3 Main Algorithms

We show that Ashenhurst decomposition of a set of Boolean functions {f₁, . . . , f_m}, or in some cases we called it m-output Ashenhurst decomposition, can be achieved by SAT solving, Craig interpolation, and functional dependency. Whenever a non- trivial decomposition exists, we derive functions h_i and g automatically for f_i(X) = hi(XH, XC, g(XG, XC)) under the corresponding variable partition X = {XH|XG|XC}.

3.1 Single-Output Ashenhurst Decomposition

We first consider single-output Ashenhurst decomposition for a Boolean function f (X) = h(XH, XC, g(XG, XC)).

(38)

3.1. Single-Output Ashenhurst Decomposition

3.1.1 Decomposition with Known Variable Partition

Proposition 2.1 in the context of Ashenhurst decomposition of a single function can be formulated as satisfiability solving as follows.

Proposition 3.1 A completely specified Boolean function f (X) can be expressed as h(X_H, X_C, g(X_G, X_C)) for some functions g and h if and only if the Boolean formula

(f (X_H¹, X_G¹, X_C) 6≡ f (X_H¹, X_G², X_C)) ∧ (f (X_H², X_G², XC) 6≡ f (X_H², X_G³, XC)) ∧

(f (X_H³, X_G³, X_C) 6≡ f (X_H³, X_G¹, X_C)) (3.1)

is unsatisfiable, where a superscript i in Yⁱ denotes the i^th copy of the instantiation of variables Y .

Observe that Formula (3.1) is satisfiable if and only if there exists more than two distinct column patterns in some matrix of the decomposition chart. Hence the unsatisfiability means there are at most two different kind of column patterns in every matrix of the decomposition chart. Since the g function has only one output bit, so the unsatisfiability of Formula (3.1) is exactly the existence condition of Ashenhurst decomposition.

The intuition why we only allowed at most two different kind of column patterns in a matrix can be realized using the decomposition chart. Function g can be

(39)

treated as a mapping from a input assignment a ∈ [[X_G, X_C]] to the Boolean value 0 or 1. Since every column index of a matrix in the decomposition chart is exactly a input assignment a ∈ [[X_G, X_C]], if there are more than two different kind of column patterns, it cannot be mapped using binary value 0 and 1.

Note that, unlike BDD-based counterparts, the above SAT-based formulation of Ashenhurst decomposition naturally extends to non-disjoint decomposition. It is because the unsatisfiability checking of Formula (3.1) essentially tries to assert that under every valuation of variables X_C the corresponding matrix of the decomposition chart has at most two column patterns. In contrast, BDD-based methods have to check the decomposability under every valuation of X_C separately.

Now we have known that the decomposability of function f can be checked through SAT solving of Formula (3.1), the derivations of functions g and h can be realized through Craig interpolation and functional dependency, respectively, as shown below.

To derive function g, we partition Formula (3.1) into two sub-formulas

ϕ_A = f (X_H¹, X_G¹, X_C) 6≡ f (X_H¹, X_G², X_C), and (3.2) ϕ_B = (f (X_H², X_G², X_C) 6≡ f (X_H², X_G³, X_C)) ∧

(f (X_H³, X_G³, X_C) 6≡ f (X_H³, X_G¹, X_C)). (3.3)

Figure 3.1 shows the corresponding circuit representation of Formulas (3.2) and (3.3). The circuit representation can be converted into a CNF formula in linear

(40)

time [27], and thus can be checked for satisfiability.

Figure 3.1: Circuit representation of Formulas (3.2) and (3.3)

Figure 3.2: (a) Relation characterized by ψ_A(X_G¹, X_G², c) for some c ∈ [[X_C]] (b) Relation after cofactoring ψ_A(X_G¹ = a, X_G², c) with respect to some a ∈ [[X_G¹]]

Lemma 3.1 For function f (X) decomposable under Ashenhurst decomposition

(41)

with variable partition X = {X_H|X_G|X_C}, the interpolant ψ_A with respect to ϕ_A of Formula (3.2) and ϕ_B of Formula (3.3) corresponds to a characteristic function such that,

1. for ϕ_A satisfiable under some c ∈ [[X_C]], there exist b₁ ∈ [[X_G¹]] and b₂ ∈ [[X_G²]]

such that ψA(b1, b2, c) = 1 if and only if the column patterns indexed by b1

and b₂ in the matrix of c of the decomposition chart of f are different;

2. for ϕ_A unsatisfiable under some c ∈ [[X_C]], there is only one column pattern in the matrix of c of the decomposition chart of f ;

3. for ϕ_A unsatisfiable under all c ∈ [[X_C]], or in other word, unsatisfiable ϕ_A, variables X_G are not the support variables of f and thus {X_H|X_G|X_C} is a trivial variable partition for f .

Proof. For f decomposable, ϕ_A∧ ϕ_B is unsatisfiable by Proposition 3.1. Moreover from Theorem 2.3, we know the interpolant ψ_A of the unsatisfiability proof is a function that refers only to the common variables, X_G¹ ∪ X_G² ∪ X_C, of ϕ_A and ϕ_B.

We analyze what the characteristic function ψ_A means for ϕ_A satisfiable under some c ∈ [[X_C]]. Let a_i ∈ [[X_Hⁱ ]], b_i ∈ [[X_Gⁱ ]], for i = 1, 2, 3, in the following discussion.

Let R_c⊆ [[X_G¹]] × [[X_G²]] be the relation that ψ_A(X_G¹, X_G², X_C = c) characterizes. On the one hand, since ϕ_A ⇒ ψ_A, we know that ψ_A is an over-approximation of the solution space of ϕ_Aprojected to the common variables X_G¹ ∪ X_G² ∪ X_C. So R_c must

(42)

contain the set

{(b1, b2) | f (a1, b1, c) 6= f (a1, b2, c) for some a1}.

That is, a pair (b₁, b₂) must be in R_c if the columns indexed by b₁ and b₂ of the matrix of c in the decomposition chart of f are of different patterns.

On the other hand, since ψ_A∧ ϕ_B is unsatisfiable, the solution space of ψ_A is disjoint from that of ϕB projected to the common variables. Suppose that valuations a₂, a₃, b₁, b₂, b₃ satisfy ϕ_B under c, that is, f (a₂, b₂, c) 6= f (a₂, b₃, c) and f (a₃, b₃, c) 6= f (a₃, b₁, c). Then the columns indexed by b₂ and b₃ of the matrix of c in the decomposition chart of f are of different column patterns, and the column indexed by b₃ and b₁ has the same property. Since we know that f is decomposable under Ashenhurst decomposition, that means there are at most two different kind of column patterns for every matrix under c. So the columns indexed by b1 and b2

of the matrix of c must be the same column pattern. For ψ_A∧ ϕ_B unsatisfiable, it represents that (b₁, b₂) cannot be in R_c if the columns indexed by b₁ and b₂ are of the same pattern. We can now conclude that the interpolant ψ_A(X_G¹, X_G², c) charac- terize all different kind of column patterns indexed by (b₁, b₂), where b₁ ∈ [[X_G¹]] and b₂ ∈ [[X_G²]] for some matrix of c.

Figure 3.2(a) illustrates the relation that ψA(X_G¹, X_G², c) characterizes for some c ∈ [[X_C]]. The left and right sets of gray dots denote the elements of [[X_G¹]] and [[X_G²]], respectively. For function f to be decomposable, there are at most two equivalence classes for the elements of [[X_Gⁱ]] for i = 1, 2. In Figure 3.2(a), the two clusters

(43)

of elements in [[X_Gⁱ]] signify two equivalence classes of column patterns indexed by [[X_Gⁱ ]]. An edge (b₁, b₂) between b₁ ∈ [[X_G¹]] and b₂ ∈ [[X_G²]] represents that b₁ is not in the same equivalence class as b₂, they have different kind of column patterns. In this case, ψ_A(b₁, b₂, c) = 1. Since all of the different column pattern pairs (b₁, b₂) in the matrix of c will be characterized by the interpolant ψ_A(X_G¹, X_G², c). So every element in one equivalence class of [[X_G¹]] is connected to every element in the other equivalence class of [[X_G²]] as Figure 3.2(a) shows.

According to the above analysis, we conclude that, for ϕ_A satisfiable under c ∈ [[X_C]], relation R_c characterizes exactly the set of index pairs (b₁, b₂) whose corresponding column patterns are different.

For ϕA unsatisfiable under some c ∈ [[XC]], it represents that there are no any different kind of column patterns in the matrix of c, so that there is only one column pattern in the matrix of c of the decomposition chart of f . Hence, under X_C = c, the valuation of f is independent of the truth assignments of XG. No matter what different truth assignment of X_G is, the value of f under the same truth assignment of X_H is of the same. (In this case, ψ_A under X_C = c can be an arbitrary function due to the solution space of ϕAis empty.) If ϕAis unsatisfiable under every c ∈ [[XC]], then the valuation of function f does not depend on X_G, that is, X_G are not the support variables of f .

Since the analysis holds for every c ∈ [[XC]], the lemma follows. 2

(44)

Next, we show how to derive function g from the interpolant ψ_A.

Lemma 3.2 For an arbitrary a ∈ [[X_G¹]], the cofactored interpolant ψA(X_G¹ = a, X_G², X_C) is a legal implementation of function g(X_G², X_C).

Proof. By Lemma 3.1, the interpolant ψ_A(X_G¹, X_G², X_C) characterizes all different column pattern pairs in every matrix who has exactly two different kind of column patterns. Let γ_a(X_G², X_C) = ψ_A(X_G¹ = a, X_G², X_C) for some arbitrary a ∈ [[X_G¹]].

Then, under every such valuation c ∈ [[X_C]], γ_a(X_G², c) characterizes the set of indices whose corresponding column patterns are different from the column pattern indexed by a. In other word, γ_a(X_G², c) characteristic either of the two equivalence classes of column patterns in the matrix of c.

Taking Figure 3.2(b) as an example, assume that a is the topmost element in [[X_G¹]]. After cofactoring, all the edges in Figure 3.2(a) will disappear except for the edges connecting a with the elements in the other equivalence class of [[X_G²]].

On the other hand, for c ∈ [[X_C]] whose corresponding matrix contains only one kind of column pattern. Since we know that X_C = c is a don’t-care condition for function g, and thus γ_a(X_G², c) can be arbitrary.

From the above analysis, we can conclude that γ_a(X_G², X_C) characterizes either the onset or the offset of function g(X_G², X_C). So γ_a(X_G², X_C) can be treated as the function g(X_G², X_C) or negation of the function g(X_G², X_C). After renaming X_G² to

X_G, we get the desired g(X_G, X_C). 2

(45)

So far we have successfully derived function g by interpolation and cofactor operation. Next we need to compute function h. The problem can be formulated as computing functional dependency as follows. Let f (X) be our target function; let function g(X_G, X_C) we just derived and identity functions ı_x(x) = x, one for each variable x ∈ X_H∪ X_C, be our base functions. So the computed dependency function is exactly our desired h. Since functional dependency can be formulated using SAT solving and interpolation [16], it well fits in our SAT-based computation framework.

Remark: For disjoint decomposition, i.e., X_C = ∅. Rather than using functional dependency, we can derive the function h in a simple way.

Given two functions f (X) and g(X_G) with variable partition X = {X_H|X_G}, our goal is to find a function h(XH, xg) such that f (X) = h(XH, g(XG)), where xg is the output variable of function g(X_G). Let a, b ∈ [[X_G]] with g(a) = 0 and g(b) = 1.

Then by Shannon expansion

h(XH, xg) = (¬xg∧ h¬xg(XH)) ∨ (xg∧ hxg(XH)),

where h¬xg(X_H) = f (X_H, X_G = a) and h_x_g(X_H) = f (X_H, X_G = b). The derivation of the offset and onset minterms of g is easy because we can pick an arbitrary minterm c in [[XG]] and see if g(c) equals 0 or 1. We then perform SAT solving on either g(X_G) or ¬g(X_G) depending on the value g(c) to derive another necessary minterm.

(46)

The above derivation of function h, however, does not scale well for decomposition with large |X_C| because we may need to compute h(X_H, X_C = c, x_g), one for every valuation c ∈ [[X_C]]. There are 2^|X^C^| cases to analyze. So when common variables exist, functional dependency may be a better way to compute function h.

The correctness of the so-derived Ashenhurst decomposition follows from Lemma 3.2 and Proposition 2.1, as the following theorem states.

Theorem 3.1 Given a function f decomposable under Ashenhurst decomposition with variable partition X = {XH|XG|XC}, then f (X) = h(XH, XC, g(XG, XC)) for functions g and h obtained by the above derivation.

Algorithm 1 Derive g and h with a given variable partition Input: f and a given variable partition X = {X_H|X_G|X_C} Output: g and h

1: F ⇐ CircuitInstantiation(f, X_H, X_G, X_C)

2: (ϕ_A, ϕ_B) ⇐ CircuitP artition(F )

3: P (X_G¹, X_G², X_C) ⇐ Interpolation(ϕ_A, ϕ_B)

4: g(X_G, X_C) ⇐ Cof actor(P, a ∈ [[X_G¹]])

5: h ⇐ F unctionalDependency(f, g)

6: return (g, h)

Algorithm 1 summarizes the algorithms we used to derive the function g and h. Line 1-2 duplicate the original function f into 6 copies, and then partition it into two part, just as Figure 3.1 shows. Line 3-4 utilize interpolation technique to

(47)

derived a relation contain the information of pairs of distinct column patterns. Then using cofactor to get the g function we want. In line 5, we formulate the problem as computing functional dependency to get the h function.

3.1.2 Decomposition with Unknown Variable Partition

The construction in the previous subsection assumes that a variable partition X = {X_H|X_G|X_C} is given. In this subsection, we will show how to automate the variable partition process within the decomposition process of function f . A similar approach was used in [17] for bi-decomposition of Boolean functions.

We introduce two control variables α_x_i and β_x_i for each variable x_i ∈ X. In addition we instantiate the original input variables X into six copies X¹, X², X³, X⁴, X⁵, and X⁶. Let

ϕ_A = (f (X¹) 6≡ f (X²)) ∧^

i

((x¹_i ≡ x²_i) ∨ β_x_i) and (3.4) ϕ_B = (f (X³) 6≡ f (X⁴)) ∧ (f (X⁵) 6≡ f (X⁶)) ∧

^

i

(((x²_i ≡ x³_i) ∧ (x⁴_i ≡ x⁵_i) ∧ (x⁶_i ≡ x¹_i)) ∨ αxi) ∧

^

i

(((x³_i ≡ x⁴_i) ∧ (x⁵_i ≡ x⁶_i)) ∨ β_x_i), (3.5)

where x^j_i ∈ X^j for j = 1, . . . , 6 are the instantiated versions of xi ∈ X. Why αxi

and β_x_i are called control variables is because each of the control variable can enable of disable the corresponding clause. For example, a clause (a + b) associates with a control variable α results in a new clause (a + b + α). When α = 1, the clause