An Automata Model for Trees with Ordered Data Values

(1)

An Automata Model for Trees with Ordered Data Values

Tony Tan

University of Edinburgh University of Warsaw

Edinburgh, UK Warsaw, Poland

Email: ttan@inf.ed.ac.uk

Abstract—Data trees are trees in which each node, besides carrying a label from a finite alphabet, also carries a data value from an infinite domain. They have been used as an abstraction model for reasoning tasks on XML and verification. However, most existing approaches consider the case where only equality test can be performed on the data values.

In this paper we study data trees in which the data values come from a linearly ordered domain, and in addition to equality test, we can test whether the data value in a node is greater than the one in another node. We introduce an automata model for them which we call ordered-data tree automata (ODTA), provide its logical characterisation, and prove that its emptiness problem is decidable in 3-NEXPTIME. We also show that the two- variable logic on unranked trees, studied by Bojanczyk, Muscholl, Schwentick and Segoufin in 2009, corresponds precisely to a special subclass of this automata model.

Then we define a slightly weaker version of ODTA, which we call weak ODTA, and provide its logical characterisation. The complexity of the emptiness problem drops to NP. However, a number of existing formalisms and models studied in the literature can be captured already by weak ODTA. We also show that the definition of ODTA can be easily modified, to the case where the data values come from a tree-like partially ordered domain, such as strings.

Index Terms—Automata, data unranked trees, logic, two- variable logic, ordered data values.

I. INTRODUCTION

Classical automata theory studies words and trees over finite alphabets. Recently there has been a growing interest in the so-called “data” words and trees, that is, words and trees in which each position, besides carrying a label from a finite alphabet, also carries a data value from an infinite domain.

Interest in such structures with data springs due to their connection to XML [1], [2], [5], [12], [17], [18], [32], as well as system specifications [9], [13], [36], where many properties simply cannot be captured by finite alphabets. This has motivated various works on data words [3], [8], [14], [22], [25], [33], as well as on data trees [4], [6], [20], [21], [24]. The common feature of these works is the addition of equality test on the data values to the logic on trees. While for finitely-labeled trees many logical formalisms (e.g., the monadic second-order logic MSO) are decidable by converting formulae to automata, even FO (first-order logic) on data words extended with data-equality is already undecidable. See, e.g., [8], [17], [33].

Thus, there is a need for expressive enough, while computa- tionally well-behaved, frameworks to reason about structures with data values. This has been quite a common theme in XML and system specification research. It has largely followed two

routes. The first takes a specific reasoning task, or a set of similar tasks, and builds algorithms for them (see, e.g., [2], [5], [34], [17], [18]). The second looks for sufficiently general automata models that can express reasoning tasks of interest, but are still decidable (see, e.g., [14], [6], [24], [36]).

Both approaches usually assume that data values come from an abstract set equipped only with the equality predicate. This is already sufficient to capture a wide range of interesting applications both in databases and verification. However, it has been advocated in [15] that comparisons based on a linear order over the data values could be useful in many scenarios, including data centric applications built on top of a database.

So far, not many works have been done in this direction.

A few works such as [31], [35], [36] are on words, while in most applications we need to consider trees. Moreover, these works are incomparable to some interesting existing formalisms [17], [6], [2], [12], [24], [14], [29] known to be able to capture various interesting scenarios common in practice. On top of that many useful techniques, notably those introduced in [17], [8], [6], [24], can deal only with data equality, and are highly dependent on specific combinatorial properties of the formalisms. They are rather hard to adapt to other more specific tasks, let alone being generalised to include more relations on data values, and they tend to produce extremely high complexity bounds, such as non-primitive- recursive, or at least as hard as the reachability problem in Petri nets. Furthermore, most known decidability results are lost as soon as we add the order relation on data values. See, e.g., [8].

In this paper we study the notion of data trees in which the data values come from a linearly ordered domain, which we call ordered-data trees. In addition to equality tests on the data values, in ordered-data trees we are allowed to test whether the data value in a node is greater than the data value in another node. To the extent it is possible, we aim to unify various ad hoc methods introduced to reason about data trees, and generalise them to ordered-data trees to make them more accessible and applicable in practice. This paper is the first step, where we introduce an automata model for ordered-data trees, provide its logical characterisation, and prove that it has decidable emptiness problem. Moreover, we also show that it can capture various well known formalisms.

Brief description of the results in this paper: The trees that we consider are unranked trees where there is no a priori bound in the number of children of a node. Moreover, we also have an order on the children of each node. We consider

(2)

a natural logic for ordered-data trees, which consists of the following relations.

• The parent relationE↓, whereE↓(x, y) means that node x is the parent of node y.

• The next-sibling relation E→, where E→(x, y) means that nodes x and y have the same parent and y is the next sibling ofx.

• The labeling predicates a(·)’s, where a(x) means that nodex is labeled with symbol a.

• The data equality predicate∼, where x ∼ y means that nodesx and y have the same data value.

• The order relation on data ≺, where x ≺ y means that the data value in node x is less than the one in node y.

• The successive order relation on data ≺suc, where x≺suc y means that the data value in node y is the minimal data value in the tree greater than the one in nodex.

We introduce an automata model for ordered-data trees, which we call ordered-data tree automata (ODTA), and pro- vide its logical characterisation. Namely, we prove that the class of languages accepted by ODTA corresponds precisely to those expressible by formulas of the form:

∃X1· · · ∃Xn ϕ ∧ ψ, (1) where X1, . . . , Xn are monadic second-order predicates; the formula ϕ is an FO formula restricted to two variables and using only the predicates E↓, E→, ∼, as well as the unary predicates X1, . . . , Xn anda’s; and the formula ψ is an FO formula using only the predicates ∼, ≺, ≺suc, as well as the unary predicates X1, . . . , Xn anda’s.

We show that the logic ∃MSO²(E↓, E→, ∼), first studied in [6], corresponds precisely to a special subclass of ODTA, where∃MSO²(E↓, E→, ∼) denotes the set of formulas of the form (1) in whichψ is a true formula. We then prove that the emptiness problem of ODTA is decidable in 3-NEXPTIME. Our main idea here is to show how to convert the ordered- data trees back to a string over finite alphabets. (See our notion of string representation of data values in Section III.) Such conversion enables us to use the classical finite state automata to reason about data values.

Then we define a slightly weaker version of ODTA, which we call weak ODTA. Essentially the only feature of ODTA missing in weak ODTA is the ability to test whether two adjacent nodes have the same data value. Without such simple feature, the complexity of the emptiness problem surprisingly drops three-fold exponentially to NP. We provide its logical characterisation by showing that it corresponds precisely to the languages expressible by the formulas of the form (1) where ϕ does not use the predicate ∼. We show that a number of existing formalisms and models can be captured already by weak ODTA, i.e. those in [17], [12], [31].

We should remark that [12] studies a formalism which consists of tree automata and a collection of set and linear constraints.^∗ It is shown that the satisfaction problem of such

∗We will later define formally what set and linear constraints are.

formalism is NP-complete. In fact, it is also shown in [12]

that a single set constraint (without tree automaton and linear constraint) already yields NP-hardness. Weak ODTA are essentially equivalent to the formalism in [12] extended with the full expressive power of the first-order logic FO(∼, ≺, ≺suc).

It is worth to note that despite such extension, the emptiness problem remains in NP.

Finally we also show that the definition of ODTA can be easily modified to the case where the data values come from a partially ordered domain, such as strings. This work can be seen as a generalisation of the works in [11] and [27].

However, it must be noted that [11], [27] deal only with data words, where only equality test is allowed on the data values and there is no order on them.

Related works: Most of the existing works in this area are on data words. In the paper [8] the model data automata was introduced, and it was shown that it captures the logic

∃MSO²(∼, <, +1), the fragment of existential monadic second order logic in which the first order part using two variables only and the predicates: the data equality ∼, as well as the order< and the successor +1 on the domain.

An important feature of data automata is that their emptiness problem is decidable, even for infinite words, but is at least as hard as reachability for Petri nets. It was also shown that the satisfiability problem for the three-variable first order logic is undecidable. Later in [11] an alternative proof was given for the decidability of the weaker logic∃MSO²(+1, ∼). The proof gives a decision procedure with an elementary upper bound for the satisfaction problem of ∃MSO²(+1, ∼) on strings.

Recently in [27] an automata model that captures precisely the logic∃MSO²(+1, ∼), both on finite and infinite words, is proposed.

Another logical approach is via the so called linear temporal logic with freeze quantifier, introduced in [14]. Intuitively, these are LTL formulas equipped with a finite number of registers to store the data values. We denote by LTL^↓_n[X,U], the LTL with freeze quantifier, wheren denotes the number of registers and the only temporal operators allowed are the neXt operator X and the Until operator U. It was shown that alternating register automata withn registers (RAn) accept all LTL^↓_n[X,U]

languages and the emptiness problem for alternating RA₁ is decidable. However, the complexity is non primitive recursive.

Hence, the satisfiability problem for LTL^↓₁(X,U) is decidable as well. Adding one more register or past time operators, such asX⁻¹orU⁻¹, to LTL^↓₁(X,U) makes the satisfiability problem undecidable. In [29] a weaker version of alternating RA1, called safety alternating RA1, is considered, and the emptiness problem is shown to be EXPSPACE-complete.

A model for data words with linearly ordered data values was proposed in [36]. The model consists of an automaton equipped with a finite number of registers, and its transitions are based on constraints on the data values stored in the registers. It is shown that the emptiness problem for this model is decidable in PSPACE. However, no logical characterisation is provided for such model.

In [7] another type of register automata for words was in-

(3)

troduced and studied, which is a generalisation of the original register automata introduced by Kaminski and Francez [25], where the data values also can come from a linearly ordered domain. Thus, the order comparison, not just equality, can be performed on data values. This model is based on the notion of monoid for data words, and is incomparable with our model here.

It is shown in the paper [31] that the satisfaction problem for FO²(+1, ≺suc) over text is decidable. A text is simply a data word in which all the data values are different and they range over the positive integers from 1 to n, for some n ≥ 1. We will see later that the satisfaction problem for FO²(+1, ≺suc) can be reduced to the emptiness problem of our model.

In [35] it is shown that the satisfaction problem of the logic FO²(<, ≺) on words is decidable. This logic is incomparable with our model. However, it should be noted that FO²(<) cannot capture the whole class of regular languages.

The work on data trees that we are aware of is in [6], [24].

In [6] it was shown that the satisfaction problem for the logic

∃MSO²(E↓, E→, ∼) over unranked trees is decidable in 3- NEXPTIME. However, no automata model is provided. We will see later how this logic corresponds precisely to a special subclass of ODTA.

In [24] alternating tree register automata were introduced for trees. They are essentially the generalisation of the alternating RA1to the tree case. It was shown that this model captures the forward XPath queries. However, no logical characterisation is provided and the emptiness problem, though decidable, is non primitive recursive.

Organisation: This paper is organised as follows. In Section II we give some preliminary background. In Section III we formally define the logic for ordered-data trees and present a few examples as well as notations that we need in this paper.

In Section IV we present two lemmas that we are going to need later on. We prove them in a quite general setting, as we think they are interesting in their own. We introduce the ordered-data tree automata (ODTA) in Section V and weak ODTA in Section VI. In Section VII we discuss a couple of the undecidable extensions of weak ODTA. In Section VIII we describe how to modify the definition of ODTA when the data values are strings, that is, when they come from a partially ordered domain. Finally we conclude with some concluding remarks in Section IX.

II. PRELIMINARIES

In this section we review some definitions that we are going to use later on. We usually use Γ and Σ to denote finite alphabets. We write 2^Γ to denote an alphabet in which each symbol corresponds to a subset of Γ. In some cases, we may need the alphabet 2²^Γ – an alphabet in which each symbol corresponds to a set of subsets of Γ. We denote the set of natural numbers{0, 1, 2, . . .} by N.

Usually we write L to denote a language, for both string and tree languages. When it is clear from the context, we use the term language to mean either a string language, or a tree language.

A. Finite state automata over strings and commutative regular languages

We usually writeM to denote a finite state automaton on strings. The language accepted by the automatonM is denoted byL(M).

LetΣ = {a1, . . . , aℓ}. For a word w ∈ Σ^∗, the Parikh image ofw isParikh(w) = (n1, . . . , nℓ), where ni is the number of appearances ofaiinw. For a vector ¯n, the inverse of the Parikh image ofn is¯ Parikh⁻¹(¯n) = {w | w ∈ Σ^∗andParikh(w) =

¯ n}.

For1 ≤ i ≤ ℓ, a vector ¯v = (n1, . . . , nℓ) ∈ N^ℓ is called an i-base, if ni 6= 0 and nj = 0, for all j 6= i. A language L is periodic, if there exist (ℓ + 1) vectors ¯u, ¯v1, . . . , ¯vℓ such that

¯

u ∈ N^ℓ and eachv¯i is an i-base and

L = [

h1,...,hℓ≥0

Parikh⁻¹(¯u + h1v¯1+ · · · + hℓv¯ℓ).

We denote such languageL by L(¯u, ¯v1, . . . , ¯vℓ).

A languageL is commutative if it is closed under reordering.

That is, if w = b1· · · bm ∈ L, and σ is a permutation on {1, . . . , m}, then bσ(1)· · · bσ(m)∈ L.

Theorem 1: [16, Corollary 2.2] A language is commutative and regular if and only if it is a finite union of periodic languages.

B. Unranked trees, tree automata and transducers

An unranked finite tree domain is a prefix-closed finite subset D of N^∗ (words over N) such that u · i ∈ D implies u · j ∈ D for all j < i and u ∈ N^∗. Given a finite labeling alphabetΣ, a Σ-labeled unranked tree t is a structure

hD, E↓, E→, {a(·)}a∈Σi, where

• D is an unranked tree domain,

• E↓is the child relation:(u, u ·i) ∈ E↓for allu, u ·i ∈ D,

• E→ is the next-sibling relation:(u · i, u · (i + 1)) ∈ E→

for all u · i, u · (i + 1) ∈ D, and

• the a(·)’s are labeling predicates, i.e. for each node u, exactly one of a(u), with a ∈ Σ, is true.

We write Dom(t) to denote the domain D. The label of a node u in t is denoted by ℓabt(u). If ℓabt(u) = a, then we say thatu is an a-node.

An unranked tree automaton [10], [38] overΣ-labeled trees is a tuple A = hQ, Σ, δ, F i, where Q is a finite set of states, F ⊆ Q is the set of final states, and δ : Q × Σ → 2^(Q^∗⁾is a transition function; we requireδ(q, a)’s to be regular languages overQ for all q ∈ Q and a ∈ Σ.

A run of A over a tree t is a function ρA:Dom(t) → Q such that for each nodeu with n children u · 0, . . . , u · (n − 1), the word ρA(u · 0) · · · ρA(u · (n − 1)) is in the language δ(ρA(u), ℓabt(u)). For a leaf u labeled a, this means that u could be assigned a stateq if and only if the empty word ǫ is inδ(q, a). A run is accepting if ρA(ǫ) ∈ F , i.e., if the root is assigned a final state. A treet is accepted by A if there exists

(4)

an accepting run of A on t. The set of all trees accepted by A is denoted by L(A).

An unranked tree (letter-to-letter) transducer with the input alphabet Σ and output alphabet Γ is a tuple T = hA, µi, where A is a tree automaton with the set of states Q, and µ ⊆ Q × Σ × Γ is an output relation. We call such T a transducer fromΣ to Γ.

Let t be a Σ-labeled tree, and t^′ aΓ-labeled tree such that Dom(t) =Dom(t^′). We say that a tree t^′ is an output ofT on t, if there is an accepting run ρA ofA on t and for each u ∈ Dom(t), it holds that (ρA(u), ℓabt(u), ℓabt^′(u)) ∈ µ.

We call T an identity transducer, if ℓabt(u) = ℓabt^′(u) for all u ∈ Dom(t). We will often view an automaton A as an identity transducer.

C. Automata with Presburger constraints (APC)

An automaton with Presburger constraints (APC) is a tuple hA, ξi, where A is an unranked tree automaton with states q0, . . . , qmandξ is an existential Presburger formula with free variablesx0, . . . , xm. A treet is accepted by hA, ξi, denoted by t ∈ L(A, ξ), if there is an accepting run ρA of A on w such that ξ(n0, . . . , nm) is true, where ni is the number of appearances ofqi inρA.

Theorem 2: [37], [40] The emptiness problem for APC is decidable in NP.

It is worth noting also that the class of languages accepted by APC is closed under union and intersection.

Oftentimes, instead of counting the number of states in the accepting run, we need to count the number of occurrences of alphabet symbols in the tree. Since we can easily embed the alphabet symbols inside the states, we always assume that the Presburger formulaξ has the free variables xa’s to denote the number of appearances of the symbol a in the tree.

As in the word case, we let Parikh(t) denote the Parikh image of the tree t. We will need the following proposition.

Proposition 3: [37], [40] Given an unranked tree automa- ton A, one can construct, in polynomial time, an existential Presburger formula ξA(x1, . . . , xℓ) such that

• for every treet ∈ L(A), ξA(Parikh(t)) holds;

• for every n = (n¯ 1, . . . , nℓ) such that ξA(¯n) holds, there exists a tree t ∈ L(A) with Parikh(t) = ¯n.

III. ORDERED-DATATTREES ANDTHEIRLOGIC

An ordered-data tree over the alphabetΣ is a tree in which each node, besides carrying a label from the finite alphabetΣ, also carries a data value from N= {0, 1, . . .}.^†

Let t be an ordered-data tree over Σ and u ∈Dom(t). We write vaℓt(u) to denote the data value in the node u. The set of all data values in thea-nodes in t is denoted by by Vt(a).

That is, Vt(a) = {vaℓt(u) | ℓabt(u) = a and u ∈Dom(t)}.

We write Vtto denote the set of data values found in the tree t. We also write #t(a) to denote the number of a-nodes in t.

†Here we use the natural numbers as data values just to be concrete. The results in our paper applies trivially for any linearly ordered domain.

The profile of a nodeu is a triplet (l, p, r) ∈ {⊤, ⊥, ∗} × {⊤, ⊥, ∗} × {⊤, ⊥, ∗}, where l = ⊤ and l = ⊥ indicate that the node u has the same data value and different data value as its left sibling, respectively; l = ∗ indicates that u does not have a left sibling. Similarly, p = ⊤, p = ⊥, and p = ∗ have the same meaning in relation to the parent of the node u, while r = ⊤, r = ⊥, and r = ∗ means the same in relation to the right sibling of the nodeu. For an ordered-data tree t over Σ, the profile tree of t, denoted by Profile(t), is a tree overΣ × {⊤, ⊥, ∗}³obtained by augmenting to each node of t its profile.

We writeProj(t) to denote the Σ projection of the ordered- data treet, that is,Proj(t) is t without the data values. When we say that an ordered-data treet is accepted by an automaton A, we mean that Proj(t) is accepted by A. An ordered-data tree t^′ is an output of a transducerT on an ordered-data tree t, if Proj(t^′) is an output of T on Proj(t), and for all u ∈ Dom(t^′), we have vaℓt^′(u) = vaℓt(u).

Figure 1 shows an example of an ordered-data treet over the alphabet {a, b, c} with its profile tree. The notation ^a_d means that the node is labeled witha and has data value d.

A. String representations of data values

Lett be an ordered-data tree over Γ. For a set S ⊆ Γ, let [S]t= \

a∈S

Vt(a) ∩ \

b /∈S

Vt(b).

Note that for eacha ∈ Γ,

Vt(a) = [

S s.t. a∈S

[S]t.

Since the sets[S]t’s are disjoint, it is immediate that|Vt(a)| = P

S s.t. a∈S |[S]t|.

Let d1 < · · · < dm be all the data values found in t. The string representation of the data values int, denoted by VΓ(t), is the string S1· · · Sm over the alphabet 2^Γ− {∅} of length m such that di ∈ [Si]t, for each i = 1, . . . , m. The notation [S]t is already introduced in [11], [12], but notVΓ(t).

Consider the example of the tree t in Figure 1. The data values int are 1, 2, 4, 6, 7, where

[{b, c}]t = {1}, [{a, b, c}]t = {2},

[{a, b}]t = {4, 7}, [{a, c}]t = {6},

[S]t = ∅, for all the other S’s.

The string VΓ(t) is S1 S2 S3 S4 S5, where S1 = {b, c}, S2= {a, b, c}, S3= S5= {a, b} and S4= {a, c}.

B. A logic for ordered-data trees

An ordered-data treet over the alphabet Σ can be viewed as a structure

t = hD, {a(·)}a∈Σ, E↓, E→, ∼, ≺, ≺suci, where

(5)

a 2

/ w j

b 1

_c

2

_a

4

_a

6

/ w j

b 2

b

4

a

7

? ? ?

c 1

c

6

b

7

a,(∗,∗,∗) 2

/ w j

b,(∗,⊥,⊥) 1

c,(⊥,⊤,⊥) 2

a,(⊥,⊥,⊥) 4

a,(⊥,⊥,∗) 6

/ w j

b,(∗,⊤,⊥) 2

_{b,(⊥,⊥,∗)}

4

a,(⊥,⊥,∗) 7

? ? ?

c,(∗,⊥,∗) 1

_{c,(∗,⊥,∗)}

6

_{b,(∗,⊤,∗)}

7

Fig. 1. An example of an ordered-data tree (on the left) and its profile (on the right).

• the relations{a(·)}a∈Σ, E↓, E→are as defined before in Subsection II-B,

• u ∼ v holds, if vaℓt(u) = vaℓt(v),

• u ≺ v holds, if vaℓt(u) < vaℓt(v),

• u≺suc v holds, if vaℓt(v) is the minimal data value in t greater than vaℓt(u).

Obviously, x≺suc y can be expressed equivalently as x ≺ y ∧ ∀z(¬(x ≺ z ∧ z ≺ y)). We include ≺suc for the sake of convenience. We also assume that we have the predicates root(x), first-sibling(x), last-sibling(x), and leaf(x) which stand for ∀y(¬E↓(y, x)), ∀y(¬E→(y, x)), ∀y(¬E→(x, y)), and ∀y(¬E↓(x, y)), respectively. We also write x ≁ y to denote¬(x ∼ y).

For O ⊆ {E↓, E→, ∼, ≺, ≺suc}, we let FO(O) stand for the first-order logic with the vocabulary O, MSO(O) for its monadic second-order logic (which extends FO(O) with quantification over sets of nodes), and ∃MSO(O) for its existential monadic second order logic, i.e., formulas of the form∃X1. . . ∃Xmψ, where ψ is an FO(O) formula over the vocabularyO extended with the unary predicates X1, . . . , Xm. We let FO²(O) stand for FO(O) with two variables, i.e., the set of FO(O) formulae that only use two variables x and y. The set of all formulae of the form ∃X1. . . ∃Xm ψ, where ψ is an FO²(O) formula is denoted by ∃MSO²(O).

Note that∃MSO²(E↓, E→) is equivalent in expressive power to MSO(E↓, E→) over the usual (without data) trees. That is, it defines precisely the regular tree languages [39].

As usual, we define Ldata(ϕ) as the set of ordered-data trees that satisfy the formulaϕ. In such case, we say that the formulaϕ expresses the language Ldata(ϕ).^‡

The following theorem is well known. It shows how even extending FO(E↓, E→) with equality test on data values immediately yields undecidability.

‡To avoid confusion, we put the subscript data on Ldata to denote a language of ordered-data trees. We use the symbol L without the subscript data to denote the usual language of trees/strings without data.

Theorem 4: (See, for example, [8], [17], [33]) The satisi- faction problem for FO(E↓, E→, ∼) is undecidable.

One of the deepest results in this area is the following decidability result for the logic∃MSO²(E↓, E→, ∼).

Theorem 5: [6] The satisfaction problem for

∃MSO²(E↓, E→, ∼) is decidable.

C. A few examples

In this subsection we present a few examples of properties of ordered-data trees expressible in our logic. Some of the examples are special cases of more general techniques that will be used later on.

Example 6: Let Σ = {a, b}. Consider the language L^a_data of ordered-data trees over Σ where an ordered-data tree t ∈ L^a_dataif and only if there exist twoa-nodes u and v such that u is an ancestor of v and either v ∼ u or v ≺ u. This language can be expressed with the formula∃X∃Y ∃Z ϕ, where ϕ states thatX contains only the node u, Y contains only the node v, Z contains precisely the nodes in the path from u to v, and v ∼ u or v ≺ u.

Example 7: For a fixed setS ⊆ Σ and an integer m ≥ 1, we consider the language L^S,m_data such that t ∈ L^S,m_data if and only if|[S]t| = m.

We pick an arbitrary symbola ∈ S. The language L^S,m_data can be expressed in∃MSO²(E↓, E→, ∼) with the formula of the form ∃X1 · · · ∃Xm ϕ, where ϕ is a conjunction of the following.

• That the predicatesX1, . . . , Xmare disjoint and each of them contains exactly one node, which is ana-node.

• That the data values found in nodes in X1, . . . , Xm are all different.

• That for eachi ∈ {1, . . . , m}, if a data value is found in a node inXi, then it must also be found in someb-node, for every b ∈ S.

(6)

• That for eachi ∈ {1, . . . , m}, if a data value found in a node inXi, then it must not be found in any b-node, for everyb /∈ S.

• That for everya-node (recall that a ∈ S) that does not belong to the Xi’s, either it has the same data value as the data value in a node belongs to one of theXi’s, or it has the data value not in [S]t.

That its data value does not belong to[S]tcan be stated as the negation of

– for eachb ∈ S, there is a b-node with the same data value; and

– the data value cannot be found in any b-node, for everyb /∈ S.

Example 8: For a fixed setS ⊆ Σ and an integer m ≥ 1, we consider the languageL^{S, (mod m)}_data such thatt ∈ L^{S, (mod m)}_data if and only if |[S]t| ≡ 0 (mod m).

This language L^{S, (mod m)}_data can be expressed in

∃MSO²(E↓, E→, ∼) with a formula of the form

∃X0· · · ∃Xm−1∃Y0· · · ∃Ym−1∃Z ψ,

where the intended meanings of

X0, . . . , Xm−1, Y0, . . . , Ym−1, Z are as follows. For a node u in an ordered-data tree t ∈ Ldata,

• the number of nodes belonging to Z is precisely |[S]t|;

and ifZ(u) holds in t, then the data value in the node u belongs to [S]t;

• Xi(u) holds in t if and only if in the subtree t^′rooted in u we have |[S]t^′| ≡ i (mod m);

• if v1, . . . , vk are all the left-siblings of u, and Xi1(v1), . . . , Xik(vk) holds, then Yi(u) holds if and only if i1+ · · · + ik≡ i (mod m).

To express all these intended meanings, it is sufficient that ψ ∈ FO²(E↓, E→, ∼).

Example 9: Let Σ = {a, b}. Consider the language L^a∗_data of ordered-data trees over Σ where an ordered-data tree t ∈ L^a∗_dataif and only if all thea-nodes with data values different from the ones in their parents satisfy the following conditions:

• the data values found in these nodes are all different;

• one of the these data values must be the largest in the tree t.

The language L^a∗_data can be expressed in ∃MSO²(E↓, ∼, ≺) with the following formula:

∃X

∀x X(x) ⇐⇒ a(x) ∧ ∃y(E↓(y, x) ∧ y ≁ x)

∧ ∀x∀y(X(x) ∧ X(y) ∧ x ∼ y → x = y)

∧ ∃x X(x) ∧ ∀y(y ≺ x ∨ x ∼ y) .

IV. TWOUSEFULLEMMAS

In this section we prove two lemmas which will be used later on. The first is combinatorial by nature, and we will use it in our proof of the decidability of ODTA. The second is an

Ehrenfeucht-Fra¨ıss´e type lemma for ordered-data trees, and we will use it in our proof of the logical characterization of ODTA.

A. A combinatorial lemma

Let G be an (undirected and finite) graph. For simplicity, we consider only the graph without self-loop. We denote by V (G) the set of vertices in G and E(G) the set of edges. For a node u ∈ V (G), we write deg(u) to denote the degree of the nodeu and deg(G) to denote max{deg(u) | u ∈ V (G)}.

A data graph over the alphabet Γ is a graph G in which each node carries a label fromΓ and a data value from N. A nodeu ∈ V (G) is called an a-node, if its label is a, in which case we write ℓabG(u) = a. We denote by vaℓG(u) the data value found in nodeu, and VG(a) the set of data values found ina-nodes in G.

Lemma 10: Let G be a data graph over Γ. Suppose for each a ∈ Γ, we have |VG(a)| ≥ deg(G)|Γ| + deg(G) + 1.

Then we can reassign the data values in the nodes in G to obtain another data graphG^′ such thatV (G) = V (G^′) and E(G) = E(G^′) and

(1) for eachu ∈ V (G^′), ℓabG(u) = ℓabG^′(u);

(2) for eacha ∈ Γ, VG(a) = VG^′(a);

(3) for eachu, v ∈ V (G), if (u, v) ∈ E(G^′), then vaℓG^′(u) 6=

vaℓG^′(v).

Note that in Lemma 10, the data graphG^′ differs fromG only in the data values on the nodes, where we require that adjacent nodes inG^′ have different data values.

B. An Ehrenfeucht-Fra¨ıss´e type lemma

We need the following notation. Ak-characteristic function on the alphabetΓ, is a function f : Γ → {0, 1, 2, . . . , k}. Let FΓ,k be the set of all suchk-characteristic functions on Γ. A function f ∈ FΓ,k is a k-characteristic function for a set S, if f (a) ∈ {1, 2, . . . , k}, for all a ∈ S, and f (a) = 0, for all a /∈ S.

Let t be an ordered-data tree and d1 < · · · < dm be the data values found int. The k-extended representation of t is the string V^k_Γ(t) = (S1, f1) · · · (Sm, fm) ∈ 2^Γ× FΓ,k such that S1· · · Sm = VΓ(t) and for each i ∈ {1, 2, . . . , m} and for eacha ∈ Γ,

1) fi is ak-characteristic function for the set Si,

2) if1 ≤ fi(a) ≤ k − 1, then there are fi(a) number of a-nodes in t with data value di,

3) iffi(a) = k, then there are at least k number of a-nodes int with data value di.

We assume that in every formula in MSO(∼, ≺, ≺suc) all the monadic second-order quantifiers precede the first-order part. That is, sentences in MSO(∼, ≺, ≺suc) are of the form:

ϕ := Q1X1· · · QsXsψ, where the Xi’s are monadic second- order variables, theQi’s are∃ or ∀ and ψ ∈ FO(∼, ≺, ≺suc) extended with the unary predicates X1, . . . , Xs. We call the integer s, the MSO quantifier rank of ϕ, denoted by MSO-qr(ϕ) = s, while we write FO-qr(ϕ) to denote the

(7)

quantifier rank of ψ, that is the quantifier rank of the first- order part ofϕ.

Lemma 11: Let t1 and t2 be ordered-data trees over Γ such that V^k2_Γ^s(t1) = V^k2_Γ^s(t2). For any MSO(∼, ≺, ≺suc) sentence ϕ such that MSO-qr(ϕ) ≤ s and FO-qr(ϕ) ≤ k, t1|= ϕ if and only if t2|= ϕ.

V. AUTOMATA FORORDERED-DATATREE

In this section we are going to introduce an automata model for ordered-data trees and study its expressive power.

Definition 12: An ordered-data tree automaton, in short ODTA, over the alphabetΣ is a triplet S = hT , M, Γ0i, where T is a letter-to-letter transducer from Σ × {⊤, ⊥, ∗}³ to the output alphabet Γ; M is an automaton on strings over the alphabet2^Γ; andΓ0⊆ Γ.

An ordered-data tree t is accepted by S, denoted by t ∈ Ldata(S), if there exists an ordered-data tree t^′ over Γ such that

• on input Profile(t), the transducer T outputs t^′;

• the automatonM accepts the string VΓ(t^′); and

• for everya ∈ Γ0, all thea-nodes in t^′have different data values.

We describe a few examples of ODTA that accept the languages described in Examples 6, 7, 8 and 9.

Example 13: An ODTA S^a = hT , M, Γ0i that accepts the language L^a_data in Example 6 can be defined as follows. The output alphabet of the transducerT is Γ = {α, β, γ}. On an input treet, the transducer T marks the nodes in t as follows.

There is only one node marked withα, one node marked with β, and that the α-node is an ancestor of β. The automaton M accepts all the strings in which the position labeled with S ∋ β is less than or equal to the position labeled with S^′ ∋ α.

(These two positions can be equal, which means S = S^′.) Finally, Γ0= ∅.

Example 14: An ODTA S^S,m = hT , M, Γ0i that accepts the language L^S,m_data in Example 7 can be defined as follows.

The transducerT is an identity transducer. The automaton M accepts all the strings in which the symbolS appears exactly m times, and Γ0= ∅.

Example 15: An ODTA S^{S, (mod m)} = hT , M, Γ0i that accepts the language L^{S, (mod m)}_data in Example 8 can be defined as follows. The transducer T is an identity transducer.

The automaton M accepts a string in which the number of appearances of the symbolS is a multiple of m, and Γ0= ∅.

Example 16: An ODTA S^a∗ = hT , M, Γ0i that accepts the language L^a∗_data in Example 9 can be defined as follows.

The output alphabet of the transducer T is Γ = {α, β}. The transducerT marks the nodes as follows. A node is marked withα if and only if it is an a-node and it has different data

value from the one of its parent. All the other nodes are marked with β. The automaton M accepts a string v if and only if the last symbol inv contains the symbol α, while Γ0= {α}.

The following proposition states that ODTA languages are closed under union and intersection, but not under negation.

We would like to remark that being not closed under negation is rather common for decidable models for data trees. Often- times models that are closed under negation have undecidable emptiness/satisfaction problem.

Proposition 17: The class of languages accepted by ODTA is closed under union and intersection, but not under negation.

Proof: For closure under union and intersection, let S1 = hT1, M1, Γ¹₀i and S2= hT2, M2, Γ²₀i be ODTA. The unionLdata(S1) ∪ Ldata(S2) is accepted by an ODTA which non-deterministically chooses to simulate eitherS1 orS2 on the input ordered-data tree. The ODTA for the intersection Ldata(S1) ∩ Ldata(S2) can be obtained by the standard cross product betweenS1 andS2.

That ODTA languages are not closed under negation follows from the fact that the negation of the language in Example 13 is not accepted by ODTA. The proof is rather straightforward, thus, omitted.

We should remark that in Section VII we will discuss that extending ODTA with the complement of languages of the form in Example 13 will immediately yield undecidability.

Next we give the ODTA characterisation of the logic

∃MSO²(E↓, E→, ∼).

Theorem 18: A language Ldata is expressible with an

∃MSO²(E↓, E→, ∼) formula if and only if it is accepted by an ODTA S = hT , M, Γ0i, where L(M) is a commutative language.

The next result is the logical characterisation of ODTA.

Theorem 19: A language Ldata is accepted by an ODTA if and only if it is expressible with a formula of the form: ∃X1· · · ∃Xm ϕ ∧ ψ, where ϕ is a formula from FO²(E↓, E→, ∼), and ψ from FO(∼, ≺, ≺suc), both extended with the unary predicatesX1, . . . , Xm.

Finally, we show that the emptiness problem for ODTA is decidable. The decision procedure in Theorem 20 runs in 3-NEXPTIME, while the decision procedure for

∃MSO²(E↓, E→, ∼) proposed in [6] also runs in 3- NEXPTIME. However, we should remark that if we use our algorithm for the satisfaction problem of∃MSO²(E↓, E→, ∼) via the translation in Theorem 18, the complexity will jump to 5-NEXPTIME, since there is a double exponential blow-up in the translation.

Theorem 20: The emptiness problem for ODTA is decid- able.

(8)

VI. WEAKODTA

A weak ODTA over Σ is a triplet S = hT , M, Γ0i where T is a letter-to-letter transducer from Σ to the output alphabet Γ, and M is a finite state automaton over 2^Γ andΓ0⊆ Γ. An ordered-data treet is accepted by S, denoted by t ∈ Ldata(S), if there exists an ordered-data tree t^′ overΓ such that

• on input Proj(t), the transducer T outputs t^′;

• the automatonM accepts the string VΓ(t^′); and

• for everya ∈ Γ0, all thea-nodes in t^′have different data values.

Note that the only difference between weak ODTA and ODTA is the equality test on the data values in neighboring nodes.

Such difference is the cause of the triple exponential leap in complexity, as stated in the following theorem.

Theorem 21: The emptiness problem for weak ODTA is in NP.

Next, we give the logical characterisation of weak ODTA.

Theorem 22: A language L is accepted by a weak ODTA if and only if L is expressible with a formula of the form:

∃X1· · · ∃Xmϕ∧ψ, where ϕ is a formula from FO²(E↓, E→), and ψ is a formula from FO(∼, ≺, ≺suc), extended with the unary predicates X1, . . . , Xm.

The proof of Theorem 22 is the same as the proof of The- orem 19. The difference is that to simulate the FO²(E↓, E→) formulaϕ, the profile information is not necessary.

A. Extending weak ODTA with Presburger constraints Like in the case of APC, we can extend weak ODTA with Presburger constraints without increasing the complexity of its emptiness problem. Let S = hT , M, Γ0i be a weak ODTA, where Σ and Γ are the input and output alphabets of T , respectively. LetΓ = {α1, . . . , αℓ}.

A weak ODTA S = hT , M, Γ0i extended with Presburger constraint is a tuple hS, ξi, where ξ(x1, . . . , xℓ, y1, . . . , y2^ℓ) is an existential Presburger formula with the free variables x1, . . . , xℓ, y1, . . . , y2^ℓ−1. A ordered-data treet is accepted by hS, ξi, if there exists an output t^′ of T on t, the automaton M accepts VΓ(t^′), for each a ∈ Γ0, all a-nodes in t^′ have different data values andξ(Parikh(t^′),Parikh(VΓ(t^′))) holds.

We write Ldata(S, ξ) to denote the set of languages accepted by hS, ξi.

It should be immediate that the emptiness problem of weak ODTA extended with Presburger constraint is still decidable in NP.

B. Comparison with other known decidable formalisms We are going to compare the expressiveness of weak ODTA with other known models with decidable emptiness.

1) DTD with integrity constraints: An XML document is typically viewed as a data tree. The most common XML formalism is Document Type Definition (DTD). In short, a DTD is a context free grammar and a tree t conforms to a DTD D, if it is a derivation tree of a word accepted by the context free grammar.

The most commonly used XML constraints are integrity constraints which are of two types.

• The key constraints are constraints of the form:

∀x∀y(a(x) ∧ a(y) ∧ x ∼ y → x = y), denoted bykey(a).

• The inclusion constraints are constraints of the form:

∀x∃y(a(x) → b(y) ∧ x ∼ y), denoted byV (a) ⊆ V (b).

The satisfiability problem of a given DTDD and a collection C of integrity constraints asks whether there exists an ordered- data tree t that conforms to the DTD that satisfies all the constraints inC. In [17] it is shown that this problem is NP- complete.

Theorem 23: Given a DTD D and a collection C of in- tegrity constraints, one can construct a weak ODTA S such that Ldata(S) is precisely the set of ordered-data trees that conforms to D and satisfies all constraints in C.

It must be noted that our construction in Theorem 23 outputs an automatonM of exponential size. This blow-up is tight, as the following example shows. Consider the case where C does not contain inclusion constraints. That is,C contains only key constraints. Then any equivalent ODTAS = hT , M, Σ0i will haveL(M) = (2^Σ− {∅})^∗. Thus, we have exponential blow-up in the size of M. Nevertheless, if we are concerned only with satisfiability, then we can lower the complexity to NP as stated in the following theorem.

Theorem 24: Given a DTDD and a collection C of integrity constraints, one can construct a weak ODTA S in non- deterministic polynomial time such thatLdata(S) 6= ∅ if and only if there exists an ordered-data treet that conforms to D and satisfies all the constraints inC.

2) Set and linear constraints for data trees: In the pa- per [12] the set and linear constraints are introduced for data trees. As argued there, those constraints, together with automata, are able to capture many interesting properties commonly used in XML practice. We review those constraints and show how they can be captured by weak ODTA extended with Presburger constraints.

Data-terms (or just terms) are given by the grammar τ := V (a) | τ ∪ τ | τ ∩ τ | τ fora ∈ Σ.

The semantics ofτ is defined with respect to a data word t:

JV (a)K_t= Vt(a) Jτ K_t= Vt− Jτ K_t

Jτ1∩ τ2K_t= Jτ1K_t∩ Jτ2K_t Jτ1∪ τ2K_t= Jτ1K_t∪ Jτ2K_t Recall thatVt=S

a∈ΣVt(a) – the set of data values found in the data treet.

A set constraint is either τ = ∅ or τ 6= ∅, where τ is a term. A data tree t satisfies τ = ∅, written as t |= τ = ∅, if and only ifJτ K_t= ∅ (and likewise for τ 6= ∅).

A linear constraint ξ over the alphabet Σ is a linear constraint on the variablesxa, for eacha ∈ Σ and zS, for each

(9)

S ⊆ Σ. A data tree t satisfies ξ, if ξ holds by interpreting xa

as the number ofa-nodes in t, and zS the cardinality|[S]t|.

Theorem 25: Given a tree automaton A and a set C of set and linear constraints, there exists a weak ODTA hS, ϕi extended with Presburger constraints such that Ldata(S, ϕ) is precisely the set of ordered-data trees accepted by A that satisfies all the constraints in C.

Its proof is simply a restatement of the proof in [12] into a language of weak ODTA.

3) FO²(+1, ≺suc) over text: Here we focus our attention on ordered-data words, which can be viewed as trees where each node has at most one child. We write w = ^a_d¹₁ · · · ^a_dⁿ_n to denote ordered-data word in which positioni has label ai

and data valuedi. It is called a text, if all the data values are different and the set of data values {d1, . . . , dn} is precisely {1, . . . , n}.

It is shown in the paper [31] that the satisfaction problem for FO²(+1, ≺suc) over text is decidable.^§The following theorem shows that this decidability can be obtained via weak ODTA.

Theorem 26: For every formula ϕ ∈ FO²(+1, ≺suc), one can construct effectively a weak ODTA S such that

• for every text w, if w ∈ Ldata(ϕ), then w ∈ Ldata(S);

• for every ordered-data word w ∈ Ldata(S), there exists a text w^′∈ Ldata(ϕ) such thatProj(w) =Proj(w^′).

VII. ANUNDECIDABLEEXTENSION

In this section we would like to remark on an undecidable extension of weak ODTA. Recall the language in Example 6.

It has already noted in the proof of Proposition 17 that its complement is not accepted by any ODTA. Formally, the complement of the language in Example 6 can be expressed with formula of the form:

∀x ∀y _

a∈Σ0

a(x) ∧ _

a∈Σ0

a(y) ∧ E↓∗(x, y) → x ≺ y, (2)

whereΣ0⊆ Σ and E↓∗ denotes the transitive closure ofE↓. It can already be deduced from [8, Proposition 29] that given an ODTA and a collection C of formulas of the form (2), it is undecidable to check whether there is an ordered-data tree t ∈ Ldata(S) such that t |= ψ, for all ψ ∈ C.

At this point we would also like to point out that extending ODTA with operation such as addition on data values will immediately yield undecidability. This can be deduced immediately from [23] where we know that together with unary predicates, addition yields undecidability.

VIII. WHEN THEDATAVALUES ARESTRINGS

In this section we discuss data trees where the data values are strings from {0, 1}^∗, instead of natural numbers. We call such trees string data trees. There are two kinds of order for

§The definition of text in [31] is slightly different, but it is equivalent to our definition. However, it turns out that the key lemma proved in [31] has a serious gap, which is filled later on in [19]. The final result is still correct though.

strings: the prefix order, and the lexicographic order. Strings with lexicographic order are simply linearly ordered domain, thus, ODTA can be applied directly in such case.

For the prefix order, we have to modify the definition of ODTA. Consider a string data treet over the alphabet Σ. Let Vtbe the set of data values found int. We define VΣ(t) as a tree over the alphabet2^Σ, where

• Dom(VΣ(t)) is Vt∪ {ǫ};

• foru, v ∈Dom(VΣ(t)), u is a parent of v if u is a prefix ofv and there is no w ∈ Dom(VΣ(t)) such that u is a prefix of w and w is a prefix of v;

• foru ∈Dom(VΣ(t)) the label of u is S, if u ∈ [S]t; and

ROOT, if u = ǫ.

We callVΣ(t) the tree representation of the data values in t.

Consider an example of a string data tree in Figure 2. We have [{a}]t= {0101} [{b}]t= {0100}

[{c}]t= {01011} [{a, b}]t= {01}

[{b, c}]t= {01000} [{a, b, c}]t= {010011}.

SoDom(VΣ(t)) = {01, 0100, 0101, 010011, 010000, 01011}, and

• 01 is the parent of 0100 and 0101;

• 0100 is the parent of 010011 and 010000; and

• 0101 is the parent of 01011.

Now an ODTA for string data trees is S = hT , A, Γ0i, whereT is a letter-to-letter transducer from Σ × {⊤, ⊥, ∗}³ toΓ; A is an unranked tree automaton over the alphabet 2^Γ; Γ0 ⊆ Γ. The requirement for acceptance is the same as in Section V, except thatA takes a tree over the alphabet 2^Γ as the input. All the results in Sections V and VI can be carried over immediately to this model.

IX. CONCLUDINGREMARKS

In this paper we study data trees in which the data values come from a linearly ordered domain, where in addition to equality test, we can test whether the data value in one node is greater than the other. We introduce ordered-data tree automata (ODTA), provide its logical characterisation, and prove that its emptiness problem is decidable. We also show the logic

∃MSO²(E↓, E→, ∼) can be captured by ODTA.

Then we define weak ODTA, which essentially are ODTA without the ability to perform equality test on data values on two adjacent nodes. We provide its logical characterisation.

We show that a number of existing formalisms and models studied in the literature so far can be captured already by weak ODTA. We also show that the definition of ODTA can be easily modified, to the case where the data values come from a partially ordered domain, such as strings.

We believe that the notion of ODTA provides new techniques to reason about ordered-data values on unranked trees, and thus, can find potential applications in practice. We also prove that ODTA capture various formalisms on data trees studied so far in the literature. As far as we know this is the first formalism for data trees with neat logical and automata characterisations.

(10)

a 01

/ w j

b 0100

_c

01011

_a

010011

_a

0101

/ w j

b 01

b

010011

a

0101

? ? ?

c 010011

c

010000

b

010000

ROOT

? {a, b}

/ w

{b} {a}

/ w ?

{a, b, c} {b, c} {c}

Fig. 2. An example of a string data tree (on the left) and the tree representation of its data values (on the right).

Acknowledgment. Work is supported by FET-Open Project FoX, grant agreement 233599. The author would like to thank Egor V. Kostylev for careful proof reading of this paper and for many useful suggestions to improve it. The author also thanks Leonid Libkin and Claire David for helpful discussions, and Nadime Francis for pointing out the reference [23]. The author was also supported by the Querying and Managing Navigational Databases project realized within the Homing Plus programme of the Foundation for Polish Science, co- financed by the European Union from the Regional Devel- opment Fund within the Operational Programme Innovative Economy (“Grants for Innovation”). Finally, the author also thanks the anonymous referees for their careful reading and comments.

REFERENCES

[1] N. Alon, T. Milo, F. Neven, D. Suciu, V. Vianu. XML with data values:

typechecking revisited. J. Comput. Syst. Sci. 66(4): 688-727 (2003).

[2] M. Arenas, W. Fan, L. Libkin. On the complexity of verifying consistency of XML specifications. SIAM J. Comput. 38(3): 841-880 (2008).

[3] M. Benedikt, C. Ley, G. Puppis. Automata vs. Logics on Data Words.

In CSL 2010.

[4] H. Bj¨orklund, M. Bojanczyk. Bounded depth data trees. In ICALP 2007.

[5] H. Bj¨orklund, W. Martens, T. Schwentick. Optimizing conjunctive queries over trees using schema information. In MFCS 2008.

[6] M. Bojanczyk, A. Muscholl, T. Schwentick, L. Segoufin. Two-variable logic on data trees and XML reasoning. J. ACM 56(3), 2009.

[7] M. Bojanczyk, B. Klin, S. Lasota. Automata with Group Actions. In LICS 2011.

[8] M. Bojanczyk, C. David, A. Muscholl, T. Schwentick, L. Segoufin. Two- variable logic on data words. ACM TOCL 2011.

[9] P. Bouyer, A. Petit, D. Th´erien. An algebraic characterization of data and timed languages. In CONCUR 2001.

[10] H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, C. L ¨oding, D. Lugiez, S. Tison, M. Tommasi. Tree Automata: Techniques and Applications. 2007.

[11] C. David, L. Libkin, T. Tan. On the Satisfiability of Two-Variable Logic over Data Words. In LPAR 2010.

[12] C. David, L. Libkin, T. Tan. Efficient Reasoning about Data Trees via Integer Linear Programming. In ICDT 2011.

[13] S. Demri, D. D’Souza, R. Gascon. A Decidable Temporal Logic of Repeating Values. In LFCS 2007.

[14] S. Demri, R. Lazic. LTL with the freeze quantifier and register automata.

ACM TOCL 10(3), 2009.

[15] A. Deutsch, R. Hull, F. Patrizi, V. Vianu. Automatic verification of data-centric business processes. In ICDT 2009.

[16] A. Ehrenfeucht, G. Rozenberg. Commutative Linear Languages. Tech- nical Report CU-CS-209-81, June 1981.

[17] W. Fan, L. Libkin. On XML integrity constraints in the presence of DTDs. J. ACM 49(3): 368–406 (2002).

[18] D. Figueira. Satisfiability of downward XPath with data equality tests.

In PODS 2009.

[19] D. Figueira. Forward-XPath and extended register automata on data- trees. Inhttp://arxiv.org/abs/1204.2495.

[20] D. Figueira. Satisfiability for two-variable logic with two successor relations on finite linear orders. In ICDT 2010.

[21] D. Figueira, L. Segoufin. Bottom-up automata on data trees and vertical XPath. In STACS 2011.

[22] O. Grumberg, O. Kupferman, S. Sheinvald. Variable Automata over Infinite Alphabets. In LATA 2010.

[23] J. Y. Halpern. Presburger Arithmetic with Unary Predicates is Π¹₁ Complete. Journal of Symbolic Logic, 56(2), 1991.

[24] M. Jurdzinski, R. Lazic. Alternation-free modal mu-calculus for data trees. In LICS 2007.

[25] M. Kaminski, N. Francez. Finite-memory automata. Theoretical Computer Science, 134(2): 329–363 (1994).

[26] M. Kaminski, T. Tan. Tree automata over infinite alphabets. In Pillars of Computer Science, 2008.

[27] A. Kara, T. Schwentick, T. Tan. Feasible Automata for Two-Variable Logic with Successor on Data Words. In LATA 2012.

[28] A. Kara, T. Schwentick, T. Zeume. Temporal Logics on Words with Multiple Data Values. In FSTTCS 2010.

[29] R. Lazic. Safety alternating automata on data words. ACM TOCL (12)2, 2011.

[30] L. Libkin. Elements of Finite Model Theory. Springer 2004.

[31] A. Manuel. Two orders and two variables. In MFCS 2010.

[32] F. Neven. Automata, logic, and XML. In CSL 2002.

[33] F. Neven, Th. Schwentick, V. Vianu. Finite state machines for strings over infinite alphabets. ACM TOCL 5(3), 403–435 (2004).

[34] T. Schwentick. XPath query containment. SIGMOD Record 33(1): 101- 109 (2004).

[35] T. Schwentick, T. Zeume. Two-Variable Logic with Two Order Rela- tions. In CSL 2010.

[36] L. Segoufin, S. Torunczyk. Automata based verification over linearly ordered data domains. In STACS 2011.

[37] H. Seidl, Th. Schwentick, A. Muscholl, P. Habermehl. Counting in trees for free. In ICALP 2004.

[38] J. Thatcher. Characterizing derivation trees of context-free grammars through a generalization of finite automata theory. JCSS 1, 1967.

[39] W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages, Vol. 3, Springer, 1997, pages 389–455.

[40] K. Verma, H. Seidl, T. Schwentick. On the complexity of equational horn clauses. In CADE 2005, pages 337–352.