Verifiable computation can be used to delegate computations to potentially untrusted ma-chines. In this thesis we will focus on a particular approach of verifiable computation –
zkSNARKs. Currently, there are a couple of potential applications for zkSNARKs, like private transactions or smart contracts in cryptocurrencies and verifiable computation. In a zkSNARK, there are two parties, a prover and a verifier. The prover is the party per-forming the computation, and the one producing a cryptographic proof π that the verifier can use to determine with high probability whether or not the computation is performed correctly.
A zkSNARK consists of a set of probabilistic algorithms: KeyGen (K), Prove (P), and Verify (V). Given a security parameter λ and a program C, a zkSNARK protocol goes as follows:
(kp, kv)← K(λ, C)
π ← P(C, kp, public, witness) {true, false} ← V(C, kv, public, π)
where kp is the proving key, kv is the verifcation key, public is the public variables, witness is the non-public (private) variables, andC is the program.
In this thesis, the programC fed to the keygen, the prover and the verifier will be the R1CS constraints (defined below) generated by our compiler. The variables inC include the input and output variables of the program together with all the intermediate values.
And public together with witness constitute the variables inC (the purpose of witness will be discussed later in this section).
Definition 2.2.1. A rank 1 constraint system[2] (R1CS) S over a field F with Ng con-straints is a set of vectors
{(ai, bi, ci)|i ∈ [1, Ng], ai ∈ F1+Nv, bi ∈ F1+Nv, ci ∈ F1+Nv}.
and a non-negative integer Ni, where
• F is a field
• Nv is the number of variables
• Ni ≤ Nv is the number of public variables.
If there is some public values x∈ FNi and private values (witness) w ∈ FNv−Ni such that
⟨ai, (1, x, w)⟩⟨bi, (1, x, w)⟩ = ⟨ci, (1, x, w)⟩
for all i∈ [1, Ng] (where⟨m, n⟩ denotes the dot product of m and n, and the left hand side of the equation multiplies the two dot products together), then S is said to be satisfiable with public values x and witness w.
The definition of R1CS can be seen as a charaterization of NP problems as NP-complete problems like SAT can be reduced to R1CS satisfaction problems in polynomial time, and this is used to define what it means to “know” something in the zkSNARK proof of knowl-edge definition.
Suppose that a programmer A has written a program P rog(a, b) = (a + b)∗ b where a, b ∈ F are the inputs to P rog and A wants to outsource this computation P rog to an untrusted cloud server CS. A can choose to encode P rog as R1CS constraints (which can then be further processed into a quadratic arithmetic program (QAP)[8]).
In this example, P rog can be encoded with a single R1CS constraint with three R1CS variables. Besides a and b, we need another variable out to represent the output of the program, that is, out = (a + b)∗ b. So if we fix the order of the variables (including the constant 1) to be [1, a, b, out], the R1CS constraint would be a singleton set consisting of the element ([0, 1, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1]). What if we require a to be a boolean?
This can be accomplised by adding another constraint ([0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]) (which says that a2 = a) to the set of constraints. If a is equal to 0, this constraint is satisfied. Otherwise, if a ̸= 0, we multiple both sides of the equation by a−1, and we get that it must be the case that a = 1, and thus, a is a boolean. In this scenario, A wants to know the result out, and the three variables a, b, and out are all public variable in the zkSNARK.
After the program is encoded into R1CS, the resulting constraints are then sent to CS,
where a key pair is first generated by A and then the programmer has to decide the input to be fed into P rog (say, fixing a = 2 and b = 3).
Then the transformed arithmetic constraints along with the fixed inputs 2 and 3 are then sent to CS (which will then try to solve the variable out in the transformed constraints and execute the prover algorithmP with a, b, and out to generate the proof π). CS then gives π and the variables that are considered to be public to A (which plays the role of the verifier). A then checks if the combination of the public variables and π is valid, and if it is valid, then A can have a high confidence that CS has indeed performed the required computations, and that the output is correct provided that the transformation of P rog is correct.
In some variants of the above scenario, the prover might want to hide some information from the verifier, and this is where the private witness in S comes into play. For example, the prover might want to hide the intermediate values resulting from the execution of a program from the verifier (and only the input and output of the program is made public), and from the zero knowledge guarantee from the zk-SNARK backend, we can know that apart from the public variables, the proof π doesn’t tell us additional information about the private witness in V C. The properties that a zkSNARK system has to satisfy is made more precise in the following paragraph.
A zkSNARK[2][3] (which stands for zero knowledge succinct non-interactive argu-ment of knowledge) satisfies the following properties:
• completeness: If public together with witness constitutes a solution to a program C, and
(kp, kv)← K(λ, C)
π ← P(C, kp, public, witness)
then
V(C, kv, public, π) = true.
• succintness: the size of the proof π generated by P is Oλ(1) (independent of the size of C).
• proof of knowledge: For any probabilistic polynomial time (PPT) adversary A, there is a PPT extractor E such that for every constant c > 0, large enough λ, auxiliary input z (where|z| = poly(λ)) and every program C of size λc, (public, witness) not a solution ofC
(kp, kv) ← K(λ, C)
• zero knowledge: meaning that the proof π does not leak any information about witness.
Completeness says that someone with a solution (public, witness) to a program C can always produce a proof π that convinces a verifier who follows the zkSNARK protocol.
Sunccinctness says that the proof π is small. Proof of knowledge says that if a PPT ad-versary produces public and a proof π thatV checks to be valid, then with a large enough λ, there is a high probability that there is a PPT knowledge extractor E that can “extract”
the knowledge witness from A so that (public, witness) is a solution toC.
Since we will be building the constraints in Agda, following the work of Stewart et al[14], we define the target compilation type R1CS as follows (parameterized over a type f):
data R1CS : Set where
IAdd : f → List (f × Var) → R1CS -- sums to zero
IMul : (a : f) → (b : Var) → (c : Var)
→ (d : f) → (e : Var) → R1CS
-- a * b * c = d * e
Hint : (Map Var ℕ → Map Var ℕ) → R1CS Log : String → R1CS
where IAdd f1 ((f2, i2) ∷ (f3, i3) ... ∷ []) expresses an additive constraint f1 + f2vi2 + f3vi3... = 0, fi ∈ F, vik ∈ F, and IMul fa b c fd e expresses a multiplicative constraint favbvc = fdvewhere fa, vb, bc, fd, ve ∈ F. The vectors in an R1CS constraint system are usually sparse and this is why the R1CS datatype is not defined as a tuple of vectors. A list of constraints of type [R1CS] in Agda can be easily transformed into the regular tuple of vectors representation. Since a set of R1CS constraints represents an NP-complete problem in general, the definition of R1CS includes hints to help the solver solve these R1CS constraints (and also Log to help with debugging).
To date, there have been numerous attempts at compiling existing programming lan-guages like C, or specially designed domain specific lanlan-guages into zkSNARK systems.
ZoKrates[16], SNARKs for C[2], Snårkl[14], and the formally verified compiler made by Fournet et al[7] are all examples of this. However, the source languages of these existing compilers lack an expressive type system. Inspired by the work of Snårkl, we attempt to construct and integrate a dependently typed embedded domain specific language into a verifiable computation system.
Chapter 3
Constructing an Embedded Type Universe
In this chapter, we are going to construct a dependently typed embedded type universe and determine R1CS variable allocation for an element in our embedded type universe. First we are going to describe how the embedded type universe is constructed for our EDSL.
The constructed type universe will have the property that each type in the type universe will only have finitely many inhabitants (when instantiated with a finite base type). For such instantiations of our type universe, we can construct an enumeration function enum that enumerates elements for any embedded type u such that every element in enum u only appears once.
Given a function occ that counts the number of occurrences of a given element in a list and a comparison function dec that tells us whether or not two elements with type u are equal, the fact that every element in enum u is unique can be expressed as follows: for any element val with type u, occ dec val (enum u) ≡ 1.
Towards the end of this chapter, we will prove that this proposition indeed holds for all type code u, and we will show how enum can be used to determine the number of R1CS variables that will be allocated in the compilation process for an element of type u.
3.1 Type Code
In this section, we are going to define the main datatype for the embedded type codes.
Type codes are used to define the types that are allowed in the embedded DSL and is defined as an inductive-recursive definition[6] in Agda. Given any type f, the type codes are defined in an Agda module parameterized over f as follows:
Definition 3.1.1 (Type Code).
data U : Set
⟦_⟧ : U → Set
data U where
`One : U
`Two : U
`Base : U
`Vec : (S : U) → ℕ → U
`Σ `Π : (S : U) → (⟦ S ⟧ → U) → U
⟦ `One ⟧ = ⊤
⟦ `Two ⟧ = Bool
⟦ `Base ⟧ = f
⟦ `Vec ty x ⟧ = Vec ⟦ ty ⟧ x
⟦ `Σ fst snd ⟧ = Σ ⟦ fst ⟧ (λ f → ⟦ snd f ⟧)
⟦ `Π fst snd ⟧ = (x : ⟦ fst ⟧) → ⟦ snd x ⟧
By interpreting the type codes in U through⟦_⟧, a subset of Agda types that are allowed in our EDSL can be obtained. In this thesis, we will instantiate f with types that represent finite fields since we will be compiling the EDSL into finite field constraints. Later on in this section we are going to define what it means for a type together with a set of operators to be a finite field.
For example, suppose that we have an Agda function fromBits : {n : ℕ} → Vec Bool n →ℕ that transforms an n bit-encoded number into ℕ, then the type code ‘Σ ‘Two (λ x₁
→ ‘Σ ‘Two (λ x₂ → ... ‘Σ ‘Two (λ xₙ → ‘Vec ‘Base (fromBits (x₁ ∷ x₂ ∷ ... ∷ xₙ ∷ []))))) expresses the type of vectors with their lengths encoded in n bits. Similarly, we can have matrices with the number of rows and columns encoded in m + n bits respectively.
Intuitively, one can see that if the type f has only finitely many inhabitants, then for any u : U,⟦ u ⟧ also only has finitely many inhabitants, and as such, it is possible to obtain an enumeration of the elements in⟦ u ⟧. Formally, we define a type A to be finite or have finitely many inhabitants if there is an enumeration l : List A of elements in A such that for any x : A, x∈ l and that any x in l only occurs once in l. We now define the pieces (list membership and element counting) that will be put together to form the definition of our finite type in Agda.