### On the Satisfiability of Two-Variable Logic over Data Words

Claire David, Leonid Libkin, and Tony Tan School of Informatics, University of Edinburgh

Abstract. Data trees and data words have been studied extensively in connection with XML reasoning. These are trees or words that, in addition to labels from a finite alphabet, carry labels from an infinite alphabet (data). While in general logics such as MSO or FO are unde- cidable for such extensions, decidablity results for their fragments have been obtained recently, most notably for the two-variable fragments of FO and existential MSO. The proofs, however, are very long and non- trivial, and some of them come with no complexity guarantees. Here we give a much simplified proof of the decidability of two-variable logics for data words with the successor and data-equality predicates. In addition, the new proof provides several new fragments of lower complexity. The proof mixes database-inspired constraints with encodings in Presburger arithmetic.

### 1 Introduction

The classical theory of automata and formal languages deals primarily with finite alphabets. Nonetheless, there are several models of formal languages, regular or context free, that permit an infinite alphabet, e.g., [3, 6, 14, 17, 24, 25]. Most of the models, however, lack the usual nice decidability properties of automata over finite alphabets, unless strongs restrictions are imposed.

Recently the subject of languages over infinite alphabets received much at- tention due to its connection with the problems of reasoning about XML [2, 4, 5, 8, 9]. The structure of XML documents is usually modeled by labeled unranked trees [15, 21, 27], and thus standard automata techniques can be used to reason about the structure of XML documents. However, XML documents carry data, which is typically modeled as labeling nodes by letters from a different, infinite alphabet.

Thus, one needs to look for decidable formalisms in the presence of a second,
infinite alphabet. Such formalisms are hard to come by, and tend to be of very
high complexity. Nonetheless, some significant progress has been made recently
[4]. Namely, it was shown that the restriction of first-order logic to its two-
variable fragment, FO^{2}, remains decidable over trees with labels coming from
an infinite alphabet (we refer to them as data trees). This is the best possible
restriction in terms of the number of variables: the three-variable fragment FO^{3}
is undecidable [4].

The result is true even if the sentence is preceded by a sequence of existential
monadic second-order quantifiers, i.e., for logic ∃MSO^{2}. The proof of this decid-
ability result, however, was very nontrivial relying on a complicated automaton
model, and gave no insight into the complexity of fragments nor the possibility
of extending it to more expressive logics.

However, we do want to have tools to reason about languages over infinite alphabets, so there is a search for easier tools. One direction is to look for re- strictions, both in terms of structures and logics [2, 10]. As for restrictions, the most natural idea appears to be to look for tools over words, rather than trees.

This has been done in [5, 8], which provided decidable formalisms for data words,
i.e., words labeled by both a finite and an infinite alphabet. In fact, [5] showed
that the two-variable logic is decidable. Specifically, it showed that in the pres-
ence of both ordering and successor relations on the domain, the logic ∃MSO^{2}
is decidable; no upper bound on the complexity is known, however. The proof,
however, is again highly nontrivial and not particularly modular.

Our main goal is to give a much simpler and completely self-contained proof
of the decidability of satisfiability of the two-variable logic over data words. We
do it for the case when the successor relation is available on the domain (as with
the successor relation, the logic ∃MSO^{2}can already define all regular languages).

The proof relies on two key ideas: reducing the problem to resoning about some specific constraints (similar to those used in database theory [1]), and using tools based on Presburger arithmetic to reason about those.

Organization In Section 2 we give the key definitions of data words and logics on them and state the main result. In Section 3 we provide additional definitions and notations. In Section 4 we provide the machinery needed for the main proof.

In Section 5 we present the proof of the decidability result. In conclusion, we analyse the complexity of the our decision procedures.

### 2 Data words and the main result

2.1 Data words

Let Σ be a finite alphabet and D be an infinite set of data values. To be concrete,
we assume that D contains N, the set of natural numbers. A data word is simply
an element of (Σ × D)^{∗}. We usually write w = ^{a}_{d}^{1}

1 · · · ^{a}_{d}^{n}

n for data words, where ai∈ Σ and di∈ D. We define the Σ projection of w as Proj(w) = a1· · · an.

An a-position is a position labeled with the symbol a. The set of data values
found in a-positions of a data word w is denoted by V_{w}(a), while the number of
a-positions in w is denoted by #_{w}(a).

The following notion is used throughout the paper. For a set S ⊆ Σ,
[S]_{w}= \

a∈S

V_{w}(a) ∩\

b /∈S

V_{w}(b).

That is, [S]w is the set of data values that are found in a-positions for all a ∈ S but are not found in any b-position for b 6∈ S. Note that the sets [S]w’s are disjoint, and that Vw(a) =S

a∈S[S]wfor each a ∈ Σ.

We say that a data word is locally different, if every position has a different data value than its left- and right-neighbors.

2.2 Logics over data words

For the purpose of logical definability, we view data words of length n as struc- tures

w = h{1, . . . , n}, +1, {a(·)}_{a∈Σ}, ∼i, (1)
where {1, . . . , n} is the domain of positions, +1 is the successor relation (i.e.,
+1(i, j) iff i + 1 = j), the a(·)’s are the labeling predicates, and i ∼ j holds iff
positions i and j have the same data value.

We let FO stand for first-order logic, MSO for monadic second-order logic
(which extends FO with quantification over sets of positions), and ∃MSO for ex-
istential monadic second order logic, i.e., sentences of the form ∃X_{1}. . . ∃X_{m}ψ,
where ψ is an FO formula over the vocabulary extended with the unary predi-
cates X_{1}, . . . , X_{m}. We let FO^{2} stand for FO with two variables, i.e., the set of
FO formulae that only use two variables x and y. The set of all sentences of the
form ∃X1. . . ∃Xmψ, where ψ is an FO^{2} formula is denoted by ∃MSO^{2}.

To emphasize that we are talking about a logic over data words we
write (+1, ∼) after the logic: e.g., FO^{2}(+1, ∼) and ∃MSO^{2}(∼, +1). Note that

∃MSO^{2}(+1) is equivalent in expressive power to MSO over the usual (not data)
words, i.e., it defines precisely the regular languages [26].

It was shown in [5] that ∃MSO^{2}(+1, <, ∼) is decidable over data words. In
terms of complexity, the satisfiability of this logic is shown to be at least as hard
as reachability in Petri nets. Without the +1 relation, the complexity drops to
Nexptime-complete; however, without +1 the logic is not sufficiently expressive
to capture regular relations on the data-free part of the word.

Our main goal is to give a transparent and self-contained proof of the follow- ing:

Theorem 1. The satisfiability problem is decidable for ∃MSO^{2}(∼, +1) over data
words. Moreover, the complexity of the decision procedure is elementary.

The result itself can already be infered from the decidability proof of the logic with local navigation over data trees given in [4], which yields a 4-exponential complexity bound. However this proof does not give any hints in understanding the difficulty of the problem. Our proof yields a 5-exponential bound.

Neither of these bounds are of course even remotely practical. The primary
goal of these results is to delineate the boundary of decidability, so that later
we could search for efficient subclasses of decidable classes of formulae. And
for such a search, it is crucial to have simple and well-defined tools for prov-
ing decidability; providing such tools is precisely our goal here. Indeed a few
fragments of lower complexity are already provided here. Furthermore, in [9] a
fragment whose satisfiability is decidable in NP is obtained. ^{1} With our proof

1 This fragment is context free languages with the constraints on the data values of the forms: ∀x∀y a(x) ∧ a(y) ∧ x ∼ y → x = y, and ∀x∃y a(x) → b(y) ∧ x ∼ y.

we gain some insight on how the complexity “moves up continuously” from NP to 5-exponential.

### 3 Additional notations

3.1 Disjunctive constraints for data words

We consider two types of constraints on data words, which are slight generaliza- tions of keys and inclusion constraints used in relational databases [1]. They are defined as the following logical sentences.

1. A disjunctive key constraint (dk) is a sentence of the form:

∀x ∀y _

a∈Σ^{0}

a(x) ∧ _

a∈Σ^{0}

a(y) ∧ x ∼ y → x = y,

where Σ^{0} ⊆ Σ. We denote such sentence by V (Σ^{0}) 7→ Σ^{0}.

2. A disjunctive inclusion constraint (dic) is as sentence of the form:

∀x ∃y _

a∈Σ_{1}

a(x) → _

b∈Σ_{2}

b(y) ∧ x ∼ y,

where Σ_{1}, Σ_{2}⊆ Σ. We denote such sentence by V (Σ_{1}) ⊆ V (Σ_{2}).

For a set C of dk’s and dic’s, the data word w satisfies C, written as w |= C, if w satisfies all sentences in C.

In [9] the constraints considered are when all the cardinalities |Σ^{0}|, |Σ1|, |Σ2|
are one, which are simply known as key and inclusion constraint.

3.2 Existential Presburger formulae

Atomic Presburger formulae are of the form: x_{1}+ x_{2}+ · · · x_{n} ≤ y1+ · · · + y_{m},
or x_{1}+ · · · x_{n} ≤ K, or x1+ · · · x_{n} ≥ K, for some constant K ∈ N. Existential
Presburger formulae are Presburger formulae of the form ∃¯x ϕ, where ϕ is a
Boolean combination of atomic Presburger formulae.

We shall be using Presburger formulae defining Parikh images of words. Let
Σ = {a1, . . . , ak}, and let v ∈ Σ^{∗}. By Parikh(v) we mean the Parikh image of v,
i.e., (#v(a1), . . . , #v(ak)), i.e., k-tuple of integers (n1, . . . , nk) so that ni is the
number of occurrences of ai in v.

With alphabet letters, we associate variables x_{a}_{1}, . . . , x_{a}_{k}. Given a Pres-
burger formula ϕ(x_{a}_{1}, . . . , x_{a}_{k}), we say that a word v ∈ Σ^{∗} satisfies it, written
as v |= ϕ(x_{a}_{1}, . . . , x_{a}_{k}) if and only if ϕ(Parikh(v)) holds. It is well-known that for
every regular language L, one can construct an existential Presburger formula
ϕL(xa_{1}, . . . , xa_{k}) so that a word v satisfies it iff it belongs to L [20]; moreover,
the formula can be constructed in polynomial time [23].

3.3 Presburger automata

A Presburger automaton is a pair (A, ϕ), where A is a finite state automaton and ϕ is an existential Presburger formula. A word w is accepted by (A, ϕ), denoted by L(A, ϕ), if w ∈ L(A) (the language of A) and ϕ(Parikh(w)) holds.

Theorem 2. [23] The emptiness problem for presburger automata is decidable in NP.

3.4 Profile automata for data words
Given a data word w = ^{a}_{d}^{1}

1 · · · ^{a}_{d}^{n}

n, the profile word of w, denoted by Profile(w), is the word

Profile(w) = (a_{1}, L_{1}, R_{1}), . . . , (a_{n}, L_{n}, R_{n}) ∈ (Σ × {∗, >, ⊥} × {∗, >, ⊥})^{∗}
such that for each position i = 1, . . . , n, the values of Li and Ri are either >,
or ⊥, or ∗. If Li = > and i > 1, it means that the position on the left, i − 1,
has the same data value as position i; otherwise Li = ⊥. If i = 1 (i.e., there is
no position on the left), then Li = ∗. The meaning of the Ri’s is similar with
respect to positions on the right of i.

Definition 1. A profile automaton A is a finite state automaton over the alpha- bet Σ × {∗, >, ⊥} × {∗, >, ⊥}. It defines a set Ldata(A) of data words as follows:

w ∈ Ldata(A) if and only if A accepts Profile(w) in the standard sense.

A profile automaton A and a set C of disjunctive constraints define a set of data words as follows.

L(A, C) = {w | w ∈ Ldata(A) and w |= C}.

3.5 A normal form for ∃MSO^{2}(∼, +1)

Decidability proofs for two-variable logics typically follow this pattern: first, in an easy step, a syntact normal form is established; then the hard part is combinatorial, where decidability is proved for that normal form (by establishing the finite-model property, or by automata techniques, for example).

A normal form for ∃MSO^{2}(∼, +1) was already given in [4], and we shall use
it with just a small modification. In [4] it was shown that every ∃MSO^{2}(∼, +1)
formula over data words is equivalent to a formula

∃X1. . . ∃Xk(χ ∧^

i

ϕi∧^

j

ψj)

where

1. χ describes the behavior of a profile automaton (i.e., it can be viewed as an
FO^{2}(+1) formula over the extended alphabet Σ × {∗, >, ⊥} × {∗, >, ⊥});

2. each ϕ_{i} is of the form ∀x∀y(α(x) ∧ α(y) ∧ x ∼ y → x = y), where α is a
conjunction of labeling predicates, X_{k}’s, and their negations; and

3. each ψj is of the form ∀x∃y α(x) → (x ∼ y ∧ α^{0}(y)), with α, α^{0} as in item 2.

The number of the unary predicates X’s is single exponential in the size of the original input sentence.

If we extend the alphabet to Σ ×2^{k}so that each label also specifies the family
of the Xi’s the node belongs to, then formulae in items 2 and 3 can be encoded
by disjunctive constraints: formulae in item 2 become dk’s V (Σ^{0}) 7→ Σ^{0}, and
formulae in item 3 become dic’s V (Σ1) ⊆ V (Σ2), where Σ^{0}, Σ1, Σ2⊆ Σ × 2^{k}.

Indeed, consider, for example, the constraint ∀x∀y(α(x) ∧ α(y) ∧ x ∼ y →
x = y). Let Σ^{0} be the set of all symbols (a, ¯b) ∈ Σ × 2^{k} consistent with α. That
is, a is the labeling symbol used in α (if α uses one) or an arbitrary letter (if α
does not use a labeling predicate), and the Boolean vector ¯b has 1 in positions
of the Xis used positively in α and 0 in positions of Xj’s used negatively in α.

Then the original constraint is equivalent to V (Σ^{0}) 7→ Σ^{0}. The transformation
of type-2 constraints into dic’s is the same. The details of this straightforward
construction can be found in the Appendix.

Hence, [4] and the above, imply the following. Let SAT-profile be the problem:

Problem: SAT-profile

Input: a profile automaton A and

a collection C of disjunctive constraints

Question: is there a data word w ∈ Ldata(A) such that w |= C?

Then:

Lemma 1. Given an ∃MSO^{2}(∼, +1) sentence ϕ, one can construct, in triple
exponential time, an instance (A, C) of SAT-profile over a new alphabet Σ so
that SAT-profile(A, C) returns true iff ϕ is satisfiable. However, the size of
(A, C) and Σ is double exponential in the size of ϕ.

Thus, our main goal now is to prove:

Theorem 3. SAT-profile is decidable with elementary complexity.

The main result, Theorem 1, is an immediate consequence of Theorem 3 and Lemma 1.

### 4 Some preliminary results

Proposition 1. For every data word w, the following holds.

1. w |= V (Σ^{0}) 7→ Σ^{0} if and only if #w(a) = |Vw(a)| for each a ∈ Σ^{0} and
[S]_{w}= ∅, whenever |S ∩ Σ^{0}| ≥ 2.

2. w |= V (Σ1) ⊆ V (Σ2) if and only if [S]w= ∅, for all S such that S ∩ Σ16= ∅ and S ∩ Σ2= ∅.

Proof. Part 1 is trivial. For part 2, note that S

a∈Σ1V_{w}(a) ⊆ S

b∈Σ2V_{w}(b) if
and only if S

a∈Σ1Vw(a) ∩ T_{b∈Σ}_{2}Vw(b) = ∅, which, of course, is equivalent to
[S]w= ∅, whenever S ∩ Σ16= ∅ and S ∩ Σ2= ∅. 2
Lemma 2. For every set C of disjunctive constraints, one can construct, in
single-exponential time, a Presburger formula ϕC(x_{a}_{1}, . . . , x_{a}_{k}) such that for ev-
ery data word w, we have w |= C if and only if ϕC(Parikh(Proj(w))) holds.

Proof. Let S1, . . . , Sm be the enumeration of non-empty subsets of Σ, where
m = 2^{|Σ|}− 1. The formula ϕC is of the form ∃z_{S}_{1} · · · ∃zSm ψ, where ψ is the
conjunction of the following quantifier-free formulas:

P1. x_{a}≥P

S3az_{S}, for every a ∈ Σ;

P2. if V (Σ^{0}) 7→ Σ^{0} ∈ C, we have the conjunction:

^

|S∩Σ^{0}|≥2

zS = 0 ∧ ^

a∈Σ^{0}

xa =X

a∈S

zS

P3. if V (Σ_{1}) ⊆ Σ_{2}∈ C, we have the conjunction:

^

S∩Σ_{1}6=∅andS∩Σ_{2}=∅

z_{S} = 0

We claim that for every data word w, w |= C if and only if ϕC(Parikh(Proj(w))) holds.

Let w be a data word such that w |= C. We need to show that
ϕC(Parikh(Proj(w))) holds. As witnesses for z_{S}, for each S ⊆ Σ, we pick
z_{S} = |[S]_{w}|. Now we need to show that all the conjuctions P1–P3 above are sat-
isfied. P1 is definitely satisfied, as for each a ∈ Σ,P

S3az_{S} = |V_{w}(a)| ≤ #_{w}(a).

P2 and P3 follow from Proposition 1.

– If w |= V (Σ^{0}) 7→ Σ^{0}, then #w(a) = |Vw(a)| for each a ∈ Σ and [S]w = ∅,
whenever |S ∩ Σ^{0}| ≥ 2. So, P2 is automatically satisfied.

– If w |= V (Σ1) ⊆ V (Σ2), then [S]w= ∅, for all S such that S ∩ Σ1 6= ∅ and S ∩ Σ2= ∅. Obviously then P3 is satisfied.

Now suppose that v is a word such that ϕC(Parikh(v)) holds. We can assign data values to v such that the resulting data word w satisfies every constraints in C. Let zS = mS be some witnesses of that ϕC(Parikh(v)) holds. Let K =P

SmS. We are going to assign the data values {1, . . . , K} to v as follows. Define a function

ξ : {1, . . . , K} → 2^{Σ}− {∅},

such that |ξ^{−1}(S)| = m_{S}. We then assign the a-positions in v with the data
values S

a∈Sξ^{−1}(S), for each a ∈ Σ, resulting in a data word w. Such assign-
ment is possible since P

a∈S|ξ^{−1}(S)| =P

a∈SmS ≤ #v(a). By definition of the
function ξ, we obtain that [S]w= ξ^{−1}(S). That w |= C follows immediately from

Proposition 1. 2

Lemma 2 immediately implies the decidability of a slightly simpler version of SAT-profile. Consider the following problem:

Problem: SAT-automaton

Input: a finite state automaton A and

a collection C of disjunctive constraints

Question: is there a data word w such that Proj(w) ∈ L(A) and w |= C?

By Lemma 2, we can construct in exponential time a Presburger formula ϕC of exponential size such that for all data words w we have, w |= C if and only if ϕC(Parikh(Proj(w))). Combining it with Theorem 2, we immediately obtain the decidability of the above problem:

Corollary 1. SAT-automaton is decidable with elementary complexity.

The following lemma is crucial in our proof of Theorem 3.

Lemma 3. Let v be a word over Σ. Suppose that for each a ∈ Σ, we are given a set Va of data values such that

– if V_{a}= ∅, then #_{v}(a) = 0; and
– #v(a) ≥ |Va| ≥ |Σ| + 3 otherwise.

Then we can assign a data value to each position in v such that the resulting data word w is locally different and for each a ∈ Σ, Va= Vw(a).

Proof. Let v = a_{1}· · · an. First we assign data values in the following manner:

Let a ∈ Σ. Assign each of the data values from V_{a} in |V_{a}| number of a-positions
in v. One position gets one data value. Since #_{a}(v) ≥ |V_{a}|, such assignment is
possible, and moreover, if #_{a}(v) > |V_{a}|, then some a-positions are without data
values. We do this for each a ∈ Σ.

Let w = ^{a}_{d}^{1}

1 · · · ^{a}_{d}^{n}

n be the resulting data word, where we write di = ] to denote that position i is still without data value. In the data word w, for each a ∈ Σ, we already have Vw(a) = Va.

However, by assigning data values just like that, the data word w may not
be locally different. There may exists i ∈ {1, . . . , n − 1} such that di= di+1 and
di, di+1 6= ]. We call such a position a conflict position. Now, we show that we
can always rearrange the data values in w such that the resulting data word has
no conflict positions. Suppose position i is a conflict position labeled a. Since
there are only |Σ| symbols, the data value di can only occur at most |Σ| times
in w. Since |V_{a}| ≥ |Σ| + 3 > |Σ|, there exists a position j such that

– aj = a and dj 6= ];

– dj−1, dj+16= di.

Now there are ≥ |Σ| + 3 − |Σ| = 3 such positions. From all such positions, pick one position j whose data value dj 6= di−1, di+1. We can then swap the data values di and dj, resulting in less number of conflict positions inside w. We can repeat this process until there is no more conflict positions inside w.

The final step is to assign data values for the positions in w which do not
have data value. This is easy. Since for each a ∈ Σ, |V_{a}| ≥ |Σ| + 3 ≥ 3, if the
data value d_{i}= ], then we can choose one data value from V_{a}_{i} which is different
from its left- and right-neighbors. This still ensures that we get a locally different

data word at the end. This completes the proof. 2

### 5 Proof of Theorem 3

For the sake presentation, we divide it into a few subsections. In Subsection 5.1, we present our algorithm for deciding SAT-profile over locally different data words. Then, we explain how our algorithm can be extended to the general case in Subsection 5.2.

5.1 Satisfiability over locally different data words

In this subsection we give elementary algorithm to decide the problem SAT- locally-different defined below. This problem is still a more restricted ver- sion of SAT-profile, but more general than SAT-automaton.

Problem: SAT-locally-different Input: a finite state automaton A and

a collection C of disjunctive constraints

Question: is there a locally different data word w such that Proj(w) ∈ L(A) and w |= C?

We further divide the proof for satisfiability SAT-locally-different into two cases:

– First, we show how to decide SAT-locally-different over data words with “many” data values.

– Second, we settle SAT-locally-different in the general case.

We say that a data word w has “many” data values if for all S ⊆ Σ, the
cardinality |[S]_{w}| is either 0 or ≥ |Σ| + 3. Notice that if a data word w has many
data values, then either |V_{w}(a)| = 0 or |V_{w}(a)| ≥ |Σ| + 3 for all a ∈ Σ.

The case of data words with many data values. By Lemma 2, we can construct a Presburger formula ϕC such that for every data word w,

w |= C if and only if ϕC(Parikh(Proj(w))) holds.

So, for every data word w, w ∈ L(A, C) if and only if Proj(w) ∈ L(A, ϕC).

Recall that the formula ϕC is of the form: ∃zS_{1} · · · ∃zS_{m} ψC, where S1, . . . , Sm

is the enumeration of non-empty subsets of Σ and the intention of each zS_{i} is to
represent |[Si]w| for data words w for which ϕC(Parikh(Proj(w))) holds.

The idea is as follows: given a set F ⊆ 2^{Σ}− {∅}, we can decide the existence
of a locally different data word w ∈ L(A, C) such that |[S]_{w}| = 0, if S ∈ F and

|[S]_{w}| ≥ |Σ| + 3, if S /∈ F .

Now, to decide the existence of a locally different data word with many data values in L(A, C), we do the following.

1. Guess a set F ⊆ 2^{Σ}− {∅}.

2. Construct the formula ϕC from C according to Lemma 2.

Let ϕC be in the form of ∃zS_{1}· · · ∃zS_{m} ψC.
3. Define the formula ϕC,F as:

∃z_{S}_{1}· · · ∃z_{S}_{m}

ψC∧ ^

S_{i}∈F

z_{S}_{i}= 0 ∧ ^

Si∈/F

z_{S}_{i} ≥ |Σ| + 3

4. Test the emptiness of L(A, ϕC,F).

To show that such algorithm is correct, we claim the following.

Claim 4 For every word v ∈ Σ^{∗}, v ∈ L(A, ϕC,F) for some F ⊆ 2^{Σ} if and only
if there exists a locally different data word w ∈ L(A, C) with many data values
such that Proj(w) = v.

Proof. If v ∈ L(A, ϕC,F) for some F , then there exist witnesses z_{S}_{i} = m_{S}_{i} such
that ϕC,F(Parikh(v)) holds. By the construction of ϕC,F, we have m_{S}_{i} = 0, if
Si∈ F and mS_{i} ≥ |Σ| + 3, if Si∈ F . As in the proof of Lemma 2, we can assign/
data values to each position of v, resulting in a data word w such that for each
S ⊆ Σ, |[S]w| = mS which is either ≥ |Σ| + 3 or 0. This means that |Vw(a)|

is either ≥ |Σ| + 3 or 0. (If |Vw(a)| = 0, it means that the symbol a does not appear in v.) By Lemma 3, we can rearrange the data values in w to obtain a locally different data word. This data word is in L(A, C).

The converse is straightforward. if w ∈ L(A, C) has many data values, then v = Proj(w) immediately satisfies ϕC,F, where F = {Si | |[Si]w| = 0}. Thus,

v ∈ L(A, ϕC,F). 2

The general case of SAT-locally-different. The algorithm is more or less the same as above. The only extra care needed is to consider the case if there exists a locally different data word w ∈ L(A, C) such that |[S]w| ≤ |Σ| + 2, for some S ⊆ Σ.

As before, the idea is to decide, given a set F ⊆ 2^{Σ} − {∅}, whether there
exists a locally different data word w ∈ L(A, C) such that |[S]w| ≤ |Σ| + 2, if
S ∈ F and ≥ |Σ| + 3, otherwise. Again, we reduce the problem to the emptiness
of a Presburger automaton.

The main difference is the way to deal with the S ∈ F , as S ∈ F does not
always imply that [S]_{w}is empty but only that its cardinality is bounded by the
constant |Σ| + 2. For all these S ∈ F , we can assume that the data [S]_{w}consists
of constant symbols, since |[S]_{w}| ≤ |Σ| + 2. We denote such sets of constant
symbols by Γ_{S}, for all S ∈ F . Then we embed those constant symbols into the
finite alphabet Σ and extend the automaton A to handle the constraints on
them.

The details of the algorithm are as follows. It consists of four main steps.

1. The guessing of the set F and the constants Γ_{S}’s.

a) Guess a set F ⊆ 2^{Σ}− {∅}.

b) For each S ∈ F , guess an integer mS ≤ |Σ|+2 according to the following rule.

– If V (Σ^{0}) 7→ Σ^{0} ∈ C, then mS = 0, if |S ∩ Σ^{0}| ≥ 2.

– If V (Σ_{1}) ⊆ V (Σ_{2}) ∈ C, then m_{S} = 0, if S ∩ Σ_{1}6= ∅ and S ∩ Σ_{2}= ∅.

c) For each S ∈ F , fix a set ΓS = {α^{S}_{1}, . . . , α^{S}_{m}_{S}} of constants such that
ΓS’s are disjoint, and ΓS∩ N = ∅. Let ΓF=S

S∈FΓS.
2. Embedding the constants of Γ_{S}’s into A.

Construct a finite state automaton A^{0} (from the automaton A) over the
alphabet Σ ∪ Σ × Γ_{F} as follows. A^{0} accepts the word v = b_{1}· · · b_{n} over
Σ ∪ Σ × Γ_{F} if and only if the following holds.

– A symbol (a, d) ∈ Σ × Γ_{F} can appear in v if and only if a ∈ S and
d ∈ ΓS.

– Let u = a1· · · an be a word over Σ such that
ai= b_{i} if bi∈ Σ,

c if bi= (c, d) ∈ Σ × ΓF

Then, u ∈ L(A).

– For i = 1, . . . , n − 1, if b_{i} = (a_{i}, d_{i}) ∈ Σ × Γ_{F} and b_{i+1} = (a_{i+1}, d_{i+1}) ∈
Γ_{F}, then d_{i} 6= d_{i+1}.

– If V (Σ^{0}) 7→ Σ^{0} is in C, then for each a ∈ Σ^{0} and α ∈ Γ_{S}, where a ∈ S,
the symbol (a, α^{S}) appears exactly once in v.

Note that A^{0} is defined from A with the parameters: F and {ΓS | S ∈ F }.

The construction is straightforward and can be found in Appendix.

3. Constructing the Presburger formula for C.

a) Construct the formula ϕC from C according to Lemma 2.

Let ϕCbe in the form of ∃zS_{1}· · · ∃zS_{m} ψC.
b) Denote by ϕC,F the formula:

∃z_{S}_{1}· · · ∃z_{S}_{m}

ψC∧ ^

S_{i}∈F

z_{S}_{i} = 0 ∧ ^

Si∈/F

z_{S}_{i}≥ |Σ| + 3

4. The decision step. Test the emptiness of L(A^{0}, ϕC,F).

The correctness of the algorithm follows from Claim 5 below.

Claim 5 There exists a locally different data word w ∈ L(A, C) if and only if
there exist a set F ⊆ 2^{Σ} and {ΓS | S ∈ F and |ΓS| ≤ |Σ| + 2} such that
L(A^{0}, ϕC,F) 6= ∅, where A^{0} and ϕC,F are as defined in our algorithm and the
constants Γ_{S}’s respect Step 1.b) above.

Proof. We prove “only if” part first. Let w ∈ L(A, C) be a locally different data word. The set F is defined as follows.

– S ∈ F , if the cardinality |[S]_{w}| ≤ |Σ| + 2.

– S /∈ F , if the cardinality |[S]w| ≥ |Σ| + 3.

Without loss of generality, we assume that [S]w = ΓS, for S ∈ F . Let w =

a_{1}

d_{1} · · · ^{a}_{d}^{n}

n. We construct the word v = b1· · · bnas follows. For each i = 1, . . . , n, bi= (ai, di), if di is in some ΓS, otherwise bi= ai.

The rest of data values are in [S]_{w}, for some S /∈ F . So, z_{S} = |[S]_{w}| ≥ |Σ| + 3
serves as witnesses for S /∈ F , and z_{S} = 0, for S ∈ F . Thus, v ∈ L(A^{0}, ϕC,F).

Now we prove the “if” part. Suppose there exist some Γ_{S}’s and a word v over
the alphabet Σ ∪ (Σ ×S

S∈SΓ_{S}) such that v ∈ L(A^{0}, ϕC,F).

Let v = b1· · · bn. If bi= (a, α) ∈ Σ × ΓS, then we simply view α as the data value in that position. For the other positions, where bi= a ∈ Σ, we assign the data values as before in Lemma 2.

Let zS = mS be the witnesses that v ∈ L(A^{0}, ϕC,F) holds. Let K =P

SmS. Define the following function:

ξ : {1, . . . , K} → 2^{Σ}− {∅},
where |ξ^{−1}(S)| = mS.

For each a ∈ Σ, we assign the a-positions in v with the data values from S

a∈Sξ^{−1}(S). If necessary, we can apply Lemma 3 to obtain a locally different
data word. The data values from ΓS does not prevent us from applying Lemma 3,

since Γ_{F}∩ {1, . . . , K} = ∅. 2

5.2 Satisfiability over general data words

Now we extend our idea above to prove Theorem 3. For that we need some
auxiliary terms. Let w = ^{a}_{d}^{1}

1 · · · ^{a}_{d}^{n}

n be a data word over Σ. A zone is a maximal interval [i, j] with the same data values, i.e. di= di+1= · · · = dj and di−16= di

(if i > 1) and dj 6= dj+1 (if j < n). Obviously each two consecutive zones have different data values. The zone [i, j] is called an S-zone, if S is the set of labels occuring in the zone.

The zonal partition of w is a sequence (k1, . . . , kl), where 1 ≤ k1 < k2 <

· · · < kl ≤ n such that [1, k1], [k1+ 1, k2], . . . , [kl+ 1, n] are the zones in w. Let
the zone [1, k1] be an S1-zone, [k1+ 1, k2] an S2-zone, [k2+ 1..k3] an S3-zone,
and so on. The zonal word of w is a data word over Σ ∪ 2^{Σ} defined as follows.

Zonal(w) = a1· · · ak_{1}

S1

d_{k}_{1}

ak_{1}+1· · · ak_{2}

S2

d_{k}_{2}

· · · ak_{l}+1· · · an

Sl

d_{n}

.
That is, the zonal word of a data word is a word in which each zone is succeeded
by a label S ∈ 2^{Σ}, if the zone is an S-zone.

Moreover, it is sufficient to assume that only the positions labeled with sym-
bols from 2^{Σ} carry data values, i.e., data values of their respective zones. Since
two consecutive zones have different data values, two consecutive positions (in
Zonal(w)) labeled with symbols from 2^{Σ} also have different data values.

Furthermore, if w is a data word over Σ, then for each a ∈ Σ,
V_{w}(a) = [

a∈S

V_{Zonal}_{(w)}(S).

Proposition 2 below shows that disjunctive constraints for data words over the
alphabet Σ can be converted into disjunctive constraints for the zonal data words
over the alphabet Σ ∪ 2^{Σ}.

Proposition 2. For every data word w over Σ, the following holds.

– For Σ^{0}⊆ Σ, w |= V (Σ^{0}) 7→ Σ^{0} if and only if

K1. Zonal(w) |= V (Q) 7→ Q, where Q = {S | S ∩ Σ^{0}6= ∅};

K2. in Zonal(w) every zone contains at most one symbol from Σ^{0}.

(By a zone in Zonal(w), we mean a maximal interval in which every positions are labeled with symbols from Σ.)

– For Σ1, Σ2⊆ Σ, w |= V (Σ1) ⊆ V (Σ2) if and only if Zonal(w) |= V (Q1) ⊆ V (Q2), where Q1= {S | S ∩ Σ16= ∅} and Q2= {S | S ∩ Σ26= ∅}.

Now, given a profile automaton A over the alphabet Σ, we can construct
effectively an automaton A^{zonal} such that for all data word w,

Profile(w) ∈ L(A) if and only if Proj(Zonal(w)) ∈ L(A^{zonal}).

Such an automaton A^{zonal} is called a zonal automaton of A. Moreover, if the dk
V (Σ^{0}) 7→ Σ^{0} ∈ C, we can impose the condition K2 in Proposition 2 inside the
automaton A^{zonal}.

This together with Proposition 2 imply that the instance (A, C) of SAT- profile can be reduced to an instance of the following problem.

Problem: SAT-locally-different-for-zonal-words
Input: • a zonal automaton A^{zonal}

• a collection C^{zonal} of disjunctive constraints over the alphabet 2^{Σ}
Question: is there a zonal word w such that

• Proj(w) ∈ L(A^{zonal}) and w |= C^{zonal} and

• in which two consecutive positions labeled with symbols from 2^{Σ}
have different data values?

The proof in the previous subsection can then be easily adapted to SAT- locally-different-for-zonal-words. The details can be found in the Ap- pendix.

### 6 Analysis of the complexity

As a conclusion, we provide the complexity of our algorithms.

SAT-automaton : NExpTime

NP(if the alphabet Σ is fixed) SAT-locally-different : NExpTime

NP(if the alphabet Σ is fixed)

SAT-profile : 2-NExpTime

NP(if the alphabet Σ is fixed)

In our algorithms, all three problems are reduced to the emptiness problem for Presburger automata which is decidable in NP(Theorem 2).

In SAT-automaton the exponential blow-up occurs when reducing the dk’s and dic’s in C to the existential Presburger formula ϕC(Lemma 2). This formula ϕC has exponentially many variables zS, for every S ⊆ Σ. Of course, if the alphabet Σ is fixed, then the reduction is polynomial, hence, the NP-membership for SAT-automaton. It is the same complexity for SAT-locally-different.

For SAT-profile the additional exponential blow-up occurs when trans-
lating the dk’s and dic’s over the alphabet Σ to the dk’s and dic’s over the
alphabet 2^{Σ}. Now combining this with Lemma 1, we obtain the 4-NExpTime
upper bound for the satisfaction of ∃MSO^{2}(∼, +1).

Acknowledgement. We acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Pro- gramme for Research of the European Commission, under the FET-Open grant agreement FOX, number FP7- ICT-233599.

### References

1. S. Abiteboul, R. Hull, V. Vianu. Foundations of Databases, Addison Wesley, 1995.

2. H. Bj¨orklund, M. Bojanczyk. Bounded depth data trees. In ICALP’07, pages 862–874.

3. L. Boasson. Some applications of CFL’s over infinte alphabets. Theoretical Com- puter Science, LNCS vol. 104, 1981, pages 146–151.

4. M. Bojanczyk, A. Muscholl, T. Schwentick, L. Segoufin. Two-variable logic on data trees and XML reasoning. J. ACM 56(3): (2009).

5. M. Bojanczyk, C. David, A. Muscholl, T. Schwentick, L. Segoufin. Two-variable logic on words with data. In LICS’06, pages 7-16.

6. P. Bouyer, A. Petit, D. Th´erien. An algebraic characterization of data and timed languages. CONCUR’01, pages 248–261.

7. S. Dal-Zilio, D. Lugiez, C. Meyssonnier. A logic you can count on. In POPL 2004, pages 135–146.

8. S. Demri, R. Lazic. LTL with the freeze quantifier and register automata. ACM TOCL 10(3): (2009).

9. W. Fan, L. Libkin. On XML integrity constraints in the presence of DTDs. J.

ACM 49(3): 368–406 (2002).

10. D. Figueira. Satisfiability of downward XPath with data equality tests. In PODS’09, pages 197–206.

11. S. Ginsburg and E.H. Spanier. Semigroups, Presburger formulas, and languages.

Pacific J. Math., 16 (1966), 285–296.

12. E. Gr¨adel, Ph. Kolaitis, M. Vardi. On the decision problem for two-variable first- order logic. BSL, 3(1):53–69 (1997).

13. M. Jurdzinski, R. Lazic. Alternation-free modal mu-calculus for data trees. In LICS’07, pages 131–140.

14. M. Kaminski, T. Tan. Tree automata over infinite alphabets. In Pillars of Com- puter Science, 2008, pages 386–423.

15. L. Libkin. Logics for unranked trees: an overview. In ICALP’05, pages 35-50.

16. F. Neven. Automata, logic, and XML. In CSL 2002, pages 2–26.

17. F. Neven, Th. Schwentick, V. Vianu. Finite state machines for strings over infinite alphabets. ACM TOCL 5(3): (2004), 403–435.

18. M. Otto. Two variable first-order logic over ordered domains. J. Symb. Log. 66(2):

685-702 (2001).

19. C. Papadimitriou. On the complexity of integer programming. J. ACM, 28(4):765–

768, 1981.

20. R. Parikh. On context-free languages. J. ACM, 13(4):570–581, 1966.

21. Th. Schwentick. Automata for XML – a survey. JCSS 73 (2007), 289–315.

22. H. Seidl, Th. Schwentick, A. Muscholl. Numerical document queries. In PODS 2003, 155–166.

23. H. Seidl, Th. Schwentick, A. Muscholl, P. Habermehl. Counting in trees for free.

In ICALP 2004, pages 1136–1149.

24. T. Tan. Graph reachability and pebble automata over infinite alphabets. In LICS’09, pages 157–166.

25. T. Tan. On pebble automata for data languages with decidable emptiness problem.

In MFCS’09, pages 712–723.

26. W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages, Vol. 3, Springer, 1997, pages 389–455.

27. V. Vianu. A web Odyssey: from Codd to XML. In PODS’01, pages 1–15.

### A Proof of Lemma 1

In [4] it was shown that every ∃MSO^{2}(∼, +1) formula over data words is equiv-
alent to a formula

θ := ∃R1. . . ∃Rk(χ ∧^

i

ϕi∧^

j

ψj)

where

1. χ describes the behavior of a profile automaton of the form:

∀x∀y

(x + 1 = y → ϑ_{1}(x, y)) ∧ (First(x) → ϑ_{2}(x)) ∧ (Last(x) → ϑ_{3}(x))
where the formula ϑ1is a boolean combinations of labeling predicates – Rl’s,
(a, L, R)’s, and their negations – for x and y, the formulae ϑ2 and ϑ3 are
boolean combinations of labeling predicates for x, the predicate First(x) says
that x is the first position, and Last(x) says x is the last position. (Note that
First and Last are definable in FO^{2}).

2. each ϕ_{i} is of the form ∀x∀y(α(x) ∧ α(y) ∧ x ∼ y → x = y), where α is a
conjunction of labeling predicates – R_{l}’s, a’s and their negations – and
3. each ψ_{j} is of the form ∀x∃y α(x) → (x ∼ y ∧ α^{0}(y)), with α, α^{0} as in item 2.

Moreover, the number of unary predicates R_{l}’s is exponential in the size of the
original input formula.

Let U (θ) = {R1, . . . , Rk}. We will show how to convert θ into a profile au- tomaton A and a set C of disjunctive constraints over the alphabet

Σ = {Sb _{R,a}| R ⊆ U (θ) and a ∈ Σ}.

First, we show how to convert the formulae ϕiand ψj into a collection C of dk’s and dic’s over bΣ.

1. Replace each predicate R(x) by the disjunction W

R∈R,a∈ΣS_{R,a}(x).

2. Replace each predicate a(x) by the disjunctionW

R⊆U (ψ)S_{R,a}(x).

For the formulae ϕi and ψj, we replace each type α(x) that occurs in them by a formulaα(x) constructed as follows. Let α beb

^

R∈R_{1}

R(x) ∧ ^

R∈R_{2}

¬R(x),

for some R1, R2⊆ U (ψ) (we can assume that R1∩ R2= ∅). Thenα(x) isb _ SR,a(x) | R1⊆ R ⊆ U (ψ) − R2 and a ∈ Σ .

It is straightforward that formulae ϕi and ψj become dk’s and dic’s over the alphabet bΣ.

The construction of the profile automaton A from χ is similar. The difference is that the labeling predicates are of the form (a, L, R) ∈ Σ×{>, ⊥, ∗}×{>, ⊥, ∗}, instead of a ∈ Σ. So we define the following alphabet.

Σbprof ile= {S_{R,a,l,r}| R ⊆ U (θ) and (a, l, r) ∈ Σ × {>, ⊥, ∗} × {>, ⊥, ∗}}.

In the similar manner as above, we do the following.

1. Replace each predicate R(x) by the disjunction _

R∈Rand(a,l,r)∈Σ×{>,⊥,∗}^{2}

SR,a,l,r(x).

2. Replace each predicate (a, l, r)(x) by the disjunction _

R⊆U (ψ)

S_{R,a,l,r}(x).

Then we do some clean up of the formulae ϑ_{1}, ϑ_{2}, ϑ_{3}. We omit the negations
by replacing each predicate ¬S_{(R,a,l,r)}(x) withW

(R^{0},a^{0},l^{0},r^{0})6=(R,a,l,r)S_{R}^{0}_{,a}0,l^{0},r^{0}(x)
After the clean up, we can assume that ϕ_{1} and ϕ_{2}are of the form

_S_{(R,a,l,r)}(x) ∧ S_{(R}^{0}_{,a}0,l^{0},r^{0})(y),
and ϑ_{2}, ϑ_{3} are of the form

_S(R,a,l,r)(x).

We build an profile automaton A = h bΣ, Q, I, F, δi that accepts the profile language defined by the χ.

– The set of states is Q

ψb= bΣprof ile.

– The set of final states is F = {S_{(R,a,l,r)}|SR,a,l,r appears in ϑ3}.

– The set of initial states is I = {S_{(R,a,l,r)}|SR,a,l,r appears in ϑ_{2}}

– The transition function δ is as follows. If (R, a, l, r)(x) appears in ϕ_{1}, then
δ(S_{R,a,l,r}, (a, l, r)) = {S_{(R}^{0}_{,a}0,l^{0},r^{0})|S_{(R,a,l,r)}(x)∧S_{(R}^{0}_{,a}0,l^{0},r^{0})(y) appears in ϑ1}.

By construction, this automaton captures the original automaton formula χ.

### B Construction of A

^{0}

### in Subsubsection “The general case of SAT-locally-different”

Let A = hΣ, Q, q0, µ, F i be the automaton, where – Σ is the finite alphabet;

– Q, q_{0}and F are the set of states, the initial state, and the set of final states,
respectively;

– µ is the set of transitions.

First, we define the automaton A = hΣ, Q, q_{0}, µ, F i as follows.

– Σ = Σ ∪ (Σ ×S

S∈FΓ_{S}).

– Q = Q, q_{0}= q_{0}, F = F .

– µ is µ plus the following transitions:

(q_{1}, (a, α^{S}_{i})) → q_{2},
if a ∈ S and the transition (q_{1}, a) → q_{2}∈ µ.

The desired automaton A^{0}is simply the intersection of A with the automaton
B which is defined as follows. B accepts a word w over Σ ×S

S∈FΓS if and only if w satisfies the following conditions.

1. The symbol (a, α^{S}_{i}) appears in w if and only if a ∈ S and α^{S}_{i} ∈ ΓS.

2. If two neighboring positions are labeled with (a1, β1) and (a2, β2), for some β1, β2∈S

S∈FΓS, respectively, then β16= β2.

3. If V (Σ^{0}) 7→ Σ^{0} is in C, then for each a ∈ Σ^{0} and α ∈ ΓS, where a ∈ S, the
symbol (a, α^{S}) appears exactly once in w.

The intuitive meaning of those items are as follows. The symbols α ∈ Γ_{S} are
supposed to be the data values from [S]_{w}. Thus, we need item 1. Item 2 are
required to guarantee that the data values α ∈⊆_{S∈F} Γ_{S}in w are locally different.

Finally, item 3 are needed to guarantee that the disjunctive key constraints in C are respected.

### C The details of the proof in Subsection 5.2

For the sake of precise presentation, we introduce a few notations. The following
notations #^{zone}_{w} (S), V_{w}^{zone}(S) and [P]^{zone}_{w} are the zone analog of #_{w}(a), V_{w}(a) and
[S]w, respectively, defined in Subsection 2.1. For S ⊆ Σ,

#^{zone}_{w} (S) = the number of S-zones in w

V_{w}^{zone}(S) = the set of data values found in the S-zones in w
and for P ⊆ 2^{Σ},

[P]^{zone}_{w} = \

S∈P

V_{w}^{zone}(S) ∩ \

T /∈P

V_{w}^{zone}(T ).

Now it is immediate that for each a ∈ Σ, Vw(a) = [

a∈S

V_{w}^{zone}(S) = [

a∈S

[

S∈P

[P]^{zone}_{w} .
It is immediate that for every data word w,

#^{zone}_{w} (S) = #_{Zonal}_{(w)}(S)
V_{w}^{zone}(S) = V_{Zonal}_{(w)}(S)

[P]^{zone}_{w} = [P]_{Zonal}_{(w)}

The following two propositions are the zone analog of Proposition 1.

Proposition 3. For every data word w and Σ^{0} ⊆ Σ, w |= V (Σ^{0}) 7→ Σ^{0} if and
only if the following holds.

1. Every zone has at most one symbol from Σ^{0}.

2. [P]^{zone}_{w} = [P]_{Zonal}(w)= ∅ whenever |{S ∈ P | S ∩ Σ^{0}6= ∅}| ≥ 2.

3. For each a ∈ Σ^{0}, |Vw(a)| = #w(a) =P

a∈S#^{zone}_{w} (S) =P

a∈S#_{Zonal}_{(w)}(S).

Proof. If w |= V (Σ^{0}) 7→ Σ^{0}, then it is immediate that K1 holds, hence K3. Now
for all a, b ∈ Σ^{0}, a 6= b, we have

V_{w}(a) ∩ V_{w}(b) = ∅
m

[

S3a

V_{w}^{zone}(S) ∩ [

T 3b

V_{w}^{zone}(T ) = ∅
m

[

S3a,T 3b

V_{w}^{zone}(S) ∩ V_{w}^{zone}(T ) = ∅
m

[P]^{zone}_{w} , whenever |S ∈ P | S ∩ Σ^{0} 6= ∅| ≥ 2.

For the converse, it is immediate that if K1 and K3 holds, then we have w |=

V ({a}) 7→ {a}, for each a ∈ Σ^{0}. As we have shown above, K2 is equivalent to
Vw(a) ∩ Vw(b) = ∅, for all a, b ∈ Σ^{0}, a 6= b. Thus, we have w |= V (Σ^{0}) 7→ Σ^{0}. 2
Proposition 4. For every data word w and Σ_{1}, Σ_{2}⊆ Σ, w |= V (Σ_{1}) ⊆ V (Σ_{2})
if and only if [P]^{zone}_{w} = [P]_{Zonal}_{(w)}= ∅ whenever

|{S ∈ P | S ∩ Σ_{1}6= ∅}| ≥ 1 and |{S ∈ P | S ∩ Σ_{2}6= ∅}| = 0.

Proof. w |= V (Σ_{1}) ⊆ V (Σ_{2}) if and only if
[

a∈Σ_{1}

V_{w}(a) ∩ [

b∈Σ_{2}

V_{w}(b) = ∅

m [

S∩Σ16=∅

V_{w}^{zone}(S) ∩ [

T ∩Σ26=∅

V_{w}^{zone}(T ) = ∅
m

[

S∩Σ16=∅

V_{w}^{zone}(S) ∩ \

T ∩Σ26=∅

V_{w}^{zone}(T ) = ∅

m

[P]^{zone}_{w} = ∅ whenever |{S ∈ P | S ∩ Σ_{1}6= ∅}| ≥ 1 and

|{S ∈ P | S ∩ Σ_{2}6= ∅}| = 0

2

Let A^{zonal} be the zonal automaton of A and C^{zonal} be a set of dk’s and dic’s
over the alphabet 2^{Σ}. Let Σ = {a_{1}, . . . , a_{k}}. Let S_{1}, . . . , S_{l} be the enumeration
of non-empty subsets of Σ. Let P_{1}, . . . , P_{m}be the enumeration of the non-empty
subsets of 2^{Σ} such that for each i, ∅ /∈ Pi. That is, m = 2^{((2}^{|Σ|}^{)−1)}− 1.

1. The guessing of the set F and the constants Γ_{P}’s.

a) Guess a set Q ⊆ 2^{(2}^{Σ}^{−{∅})}− {∅}.

b) For each P ∈ Q, guess an integer m_{P} ≤ 2^{|Σ|}+2 according to the following
rule.

– If V (Σ^{0}) 7→ Σ^{0} ∈ C, then mP = 0, whenever |{S ∈ P | S ∩ Σ^{0} 6=

∅}| ≥ 2. (This is to satisfy the condition in Proposition 3.)

– If V (Σ1) ⊆ V (Σ2) ∈ C, then mP = 0, if |{S ∈ P | S ∩ Σ1 6= ∅}| ≥ 1 and |{S ∈ P | S ∩ Σ26= ∅}| = 0. (This is to satisfy the condition in Proposition 4.)

c) For each P ∈ Q, fix a set Γ_{P} = {α^{P}_{1}, . . . , α^{P}_{mP}} of constants such that
Γ_{P}’s are disjoint, and Γ_{P}∩ N = ∅. Let ΓQ=S

P∈QΓ_{P}.
2. Embedding the constants of Γ_{P}’s into A^{zonal}.

Construct a finite state automaton A^{0} over the alphabet Σ ∪ 2^{Σ} (from the
zonal automaton A^{zonal}) as follows. A^{0} accepts the word

v = a1· · · ak_{1}R1ak_{1}+1· · · ak_{2} R2 · · · ak_{l}+1· · · an Rl.
over Σ ∪ 2^{Σ}∪ Σ × ΓQ if and only if

– A symbol (S, d) ∈ 2^{Σ}× ΓF can appear in v if and only if S ∈ P and
d ∈ Γ_{P}.

– Let u = a_{1}· · · ak1S_{1} a_{k}_{1}_{+1}· · · ak2S_{2} · · · akl+1· · · anS_{l} be a word over
Σ ∪ 2^{Σ} such that

Si= Ri if Si∈ 2^{Σ}

R if Si= (R, d) ∈ 2^{Σ}× Γ_{F}
Then, u ∈ L(A^{Zonal}).

– For i = 1, . . . , l, if Ri= (Si, di) ∈ Σ × ΓF and Ri+1 = (Si+1, di+1) ∈ ΓF, then di6= di+1.

– If V (Σ^{0}) 7→ Σ^{0} is in C, then

• for each S ∈ 2^{Σ} and α ∈ ΓP, where S ∈ P and S ∩ Σ^{0} 6= ∅, the
symbol (S, α) appears exactly once in v;

• for each symbol a ∈ Σ^{0}, the symbol a appears at most once in a
zone.

3. Constructing the Presburger formula for C.

a) Construct the formula ϕ^{zonal}_{C} from C according to Lemma 2.

Let ϕ^{zonal}_{C} be in the form of ∃zP_{1}· · · ∃zP_{m} ψCzonal.
b) Denote by ϕCzonal,Q the formula:

∃z_{P}_{1}· · · ∃z_{P}_{m}

ψC∧ ^

P_{i}∈Q

z_{P}_{i} = 0 ∧ ^

Pi∈Q/

z_{P}_{i} ≥ 2^{|Σ|}+ 3

4. The decision step. Test the emptiness of L(A^{0}, ϕC,F).

The proof of the correctness follows the same reasoning as in Claim 5.