Journal of Computer and System Sciences

(1)

Contents lists available atScienceDirect

Journal of Computer and System Sciences

www.elsevier.com/locate/jcss

On pebble automata for data languages with decidable emptiness problem ^✩ ^, ^✩✩

Tony Tan

School of Informatics, University of Edinburgh, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history:

Received 6 November 2009

Received in revised form 16 February 2010 Available online 17 March 2010

Keywords:

Finite state automata Inﬁnite alphabet Decidability

In this paper we study a subclass of pebble automata (PA) for data languages for which the emptiness problem is decidable. Namely, we show that the emptiness problem for weak 2-pebble automata is decidable, while the same problem for weak 3-pebble automata is undecidable. We also introduce the so-called top view weak PA. Roughly speaking, top view weak PA are weak PA where the equality test is performed only between the data values seen by the two most recently placed pebbles. The emptiness problem for this model is still decidable. It is also robust: alternating, non-deterministic and deterministic top view weak PA have the same recognition power; and are strong enough to accept all data languages expressible in Linear Temporal Logic with the future-time operators, augmented with one register freeze quantiﬁer.

1. Introduction

Regular languages are clearly one of the most important concepts in computer science. They have applications in basically all branches of computer science. It can be argued that the following properties contributed to their success.

1. Expressiveness: In many settings regular languages are powerful enough to capture the kinds of patterns that have to be described.

2. Decidability: Unlike many general computational models, the mechanisms associated with regular languages allow one to perform automated semantic analysis.

3. Eﬃciency: The model checking problem, that is, testing whether a given string is accepted by a given automaton can be solved in polynomial time.

4. Closure properties: The class of regular languages possesses all important closure properties.

5. Robustness: The class of regular languages has many characterizations, which include ﬁnite state automata, regular expressions, monoids and monadic second-order logic.

Moreover, similar notions of regularity have been successfully generalized to other kinds of structures, including infinite strings and finite, as well as infinite, ranked or unranked, trees. Most recent applications of regular languages (on infinite strings and finite, unranked trees, respectively) are in model checking and XML processing.¹

✩ This work was done while the author was in the Department of Computer Science in Technion – Israel Institute of Technology.

✩✩ Its extended abstract version has also been published in Tan (2009) [17].

E-mail address:ttan@inf.ed.ac.uk.

1 Much of the materials in Section 1 are taken from [3].

doi:10.1016/j.jcss.2010.03.004

(2)

•

In model checking a system is a finite state one and properties are specified in a logic like LTL. Satisfiability of a formula in a system is checked on the structure that is the product of the system automaton and an automaton corresponding to the formula. The step from the “real” system to its finite state representation usually involves many abstraction, especially with respect to data values (variables, process numbers, etc.). Often their range is restricted to a finite domain.

Even though this approach has been successful and found its way into large scale industrial applications, the ﬁnite abstraction has some inherent shortcomings. As an example, n identical processes with m states each give rise to an overall model of size mⁿ. If the number of processes is unbounded or unknown in advance, the ﬁnite state approach fails. Previous work has shown that even in such setting decidability can be obtained by restricting the problem in various ways [1,7].

•

In XML document processing, regular concepts occur in various contexts. For example, the structural specification of XML documents are usually given as regular languages, like DTD or XML schema [12]. However, such specifications usually ignore the attributes and data values. From a database point of view, this is not completely satisfactory, because a schema should allow one to describe not only the structure of the data, but also to define restrictions on the data values via integrity constraints such as key or inclusion constraints. There exists a work addressing this problem [2], but like in the case of model checking, the methods rely heavily on a case-to-case analysis.

So, in the above settings, the ﬁnite state abstraction leads to interesting results, but does not address all problems arising in applications. In both cases, it would already be a big advance, if each position, in either a string or a tree, could carry a data value, in addition to its label.

This paper is part of a broader research program which aims at studying such extensions in a systematic way. As any kind of operations on the infinite domain quickly leads to undecidability of basic processing tasks (even a linear order on the domain is harmful), we concentrate on the setting, where data values can only be tested for equality. Furthermore, in this paper we only consider finite data strings, that is, finite strings, where each position carries a label from a finite alphabet and a data value from an infinite domain. A data language is a language consisting of finite data strings. Recently, there has been a significant amount of work in this direction. See, for example, [3,4,6,9,13,15].

Roughly speaking, there are two approaches to studying data languages: logic and automata. Below is a brief survey on both approaches. For a more comprehensive survey, we refer the reader to [15]. The study of data languages, which can also be viewed as languages over inﬁnite alphabets, starts with the introduction of ﬁnite-memory automata (FMA) in [9], which are also known as register automata (RA). The study of FMA was continued and extended in [13], in which pebble automata (PA) were also introduced. Each of these models has its own advantages and disadvantages. Languages accepted by FMA are closed under standard language operations: intersection, union, concatenation, and Kleene star. In addition, from the computational point of view, FMA are a much easier model to handle. Their emptiness problem is decidable, whereas the same problem for PA is not. However, the PA languages possess a very nice logical property: closure under all boolean operations, whereas FMA languages are not closed under complementation.

Later in [4] first-order logic for data languages was considered, and, in particular, the so-called data automata was intro- duced. It was shown that data automata define the fragment of existential monadic second-order logic for data languages in which the first-order part is restricted to two variables only. An important feature of data automata is that their emptiness problem is decidable, even for the infinite words, but is at least as hard as reachability for Petri nets. The automata themselves always work nondeterministically and seemingly cannot be determinized, see [3]. It was also shown that the satisfiability problem for the three-variable first-order logic is undecidable.

Another logical approach is via the so-called linear temporal logic with n register freeze quantiﬁer, denoted LTL_n^↓

(X, U)

, see [6].

It was shown that one-way alternating n register automata accept all LTL^↓_n

(X, U)

languages and the emptiness problem for one-way alternating one register automata is decidable. Hence, the satisﬁability problem for LTL^↓₁

(X, U)

is decidable, as well.

Adding one more register or past time operators to LTL^↓₁

(X, U)

makes the satisﬁability problem undecidable.

In this paper we continue the study of PA, which are finite state automata with a finite number of pebbles. The pebbles are placed on/lifted from the input word in the stack discipline – first in last out – and are intended to mark positions in the input word. One pebble can only mark one position and the most recently placed pebble serves as the head of the automaton. The automaton moves from one state to another depending on the current label and the equality tests among data values in the positions currently marked by the pebbles, as well as, the equality tests among the positions of the pebbles.

Furthermore, as deﬁned in [13], there are two types of PA, according to the position of the new pebble placed. In the ﬁrst type, the ordinary PA, also called strong PA, the new pebbles are placed at the beginning of the string. In the second type, called weak PA, the new pebbles are placed at the position of the most recent pebble. Obviously, two-way weak PA is just as expressive as two-way ordinary PA. However, it is known that one-way non-deterministic weak PA are weaker than one-way ordinary PA, see [13, Theorem 4.5].

We show that the emptiness problem for one-way weak 2-pebble automata is decidable, while the same problem for one-way weak 3-pebble automata is undecidable. We also introduce the so-called top view weak PA. Roughly speaking, top view weak PA are one-way weak PA where the equality test is performed only between the data values seen by the two most recently placed pebbles. Top view weak PA are quite robust: alternating, non-deterministic and deterministic top view weak PA have the same recognition power. To the best of our knowledge, this is the ﬁrst model of computation for

(3)

data language with such robustness. It is also shown that top view weak PA can be simulated by one-way alternating one-register RA. Therefore, their emptiness problem is decidable. Another interesting feature is that top view weak PA can simulate all LTL^↓₁

(X, U)

languages, and the number of pebbles needed to simulate such LTL sentences corresponds linearly to the so-called free quantiﬁer rank of the sentences, the depth of the nesting level of the freeze operators in the sentence.

This paper is organized as follows. In Section 2 we review the models of computations for data languages considered in this paper. We present the proof of the equivalence between alternating and deterministic weak 2-PA in Section 3. Section 4 and Section 5 deals with the decidability and the complexity issues of weak PA, respectively. In Section 6 we introduce top view weak PA. We also introduce a simple extension to top view weak PA, called unbounded top view weak PA, in which the number of pebbles is unbounded in Section 7. Finally, we end our paper with a brief remark in Section 8.

2. Models of computation

In this section we recall the deﬁnitions of weak PA from [13] and of register automata (RA) from [6,9]. We will use the following notation. We always denote by

Σ

a ﬁnite alphabet of labels and by

D

an inﬁnite set of data values. A

Σ

-data word w

=

_σ₁

a1

_σ₂

a2

· · ·

_σ_n

an

is a ﬁnite sequence over

Σ × D

^{, where}

σ

i

∈ Σ

^{and a}i

∈ D

^{. A}

Σ

-data language is a set of

Σ

-data words.

We will also use the following notations.

Proj_Σ

(

w

) = σ

1

· · · σ

n

Proj_D

(

w

) =

^a1

· · ·

^an

Cont_Σ

(

w

) = { σ

1

, . . . , σ

n

}

Cont_D

(

w

) = {

^a1

, . . . ,

a_n

}

We assume that neither of

Σ

and

D

contain the left-end marker

or the right-end marker

. The input word to the automaton is of the form

^w

^{, where}

^and

mark the left-end and the right-end of the input word. Finally, the symbols

ν , ϑ, σ , . . .

, possibly indexed, denote labels in

Σ

and the symbols a

,

b

,

c

,

d

, . . .

, possibly indexed, denote data values in

D

. 2.1. Pebble automata

Deﬁnition 1. (See [13, Deﬁnition 2.3].) A one-way alternating weak k-pebble automaton (k-PA) over

Σ

is a system

A =

Σ,

^Q

,

q₀

,

F

, μ ,

U

whose components are deﬁned as follows.

•

^{Q , q}0

∈

^{Q and F}

⊆

Q are a ﬁnite set of states, the initial state, and the set of ﬁnal states, respectively;

•

^U

⊆

^Q

−

F is the set of universal states; and

• μ ⊆ C × D

is the set of transitions, where

–

C

is a set whose elements are of the form

(

i

, σ ,

V

,

q

)

, where 1

ⁱ

^k,

σ _{∈ Σ}

, V

⊆ {

ⁱ

+

¹

, . . . ,

k

}

^{and q}

∈

^{Q ; and} –

D

is a set whose elements are of the form

(

q

, act)

, where q

∈

^{Q and}

act

is either

stay

,

right

,

place

-

pebble

or

lift

-

pebble

.

Elements of

μ

will be written as

(

i

, σ ,

V

,

q

) → (

^p

, act)

. Given a word w

=

_σ₁

a1

· · ·

_σ_n

an

∈ (Σ × D)

^∗, a conﬁguration of

A

on

^w

is a triple

[

ⁱ

,

q

, θ ]

^{, where i}

∈ {

¹

, . . . ,

k

}

^{, q}

∈

^{Q ,} and

θ : {

ⁱ

,

i

+

¹

, . . . ,

k

} → {

⁰

,

1

, . . . ,

n

,

n

+

¹

}

, where 0 and n

+

1 are positions of the end markers

^and

, respectively. The function

θ

deﬁnes the position of the pebbles and is called the pebble assignment. The initial conﬁguration is

γ

0

= [

^k

,

q₀

, θ

0

]

^, where

θ

0

(

k

) =

0 is the initial pebble assignment. A conﬁguration

[

ⁱ

,

q

, θ]

^{with q}

∈

F is called an accepting conﬁguration.

A transition

(

i

, σ ,

V

,

p

) → β

applies to a conﬁguration

[

^j

,

q

, θ ]

^{, if}

(

1

)

i

=

^{j and p}

=

^q,

(

2

)

V

= {

^l

>

i: a_{θ (}l)

=

^aθ (i)

}

^{, and}

(

3

) σ

θ (i)

= σ

.

Note that in a conﬁguration

[

ⁱ

,

q

, θ ]

, pebble i is in control, serving as the head pebble.

Next, we deﬁne the transition relation

Aas follows:

[

ⁱ

,

q

, θ ]

A

[

ⁱ

,

q

, θ

]

, if there is a transition

α → (

^p

, act) ∈ μ

that applies to

[

ⁱ

,

q

, θ ]

such that q

=

^p,

θ

(

j

) = θ(

^j

)

, for all j

>

i, and

– if

act = stay

^{, then i}

=

^{i and}

θ

(

i

) = θ(

ⁱ

)

; – if

act = right

^{, then i}

=

^{i and}

θ

(

i

) = θ(

ⁱ

) +

^1;

– if

act = lift

^-

pebble

, then i

=

ⁱ

+

^1;

– if

act = place

^-

pebble

, then i

=

ⁱ

−

^1,

θ

(

i

−

¹

) = θ

(

i

) = θ(

ⁱ

)

.

As usual, we denote the reﬂexive transitive closure of

A by

^∗_A. When the automaton

A

is clear from the context, we will omit the subscript

A

^.

(4)

Remark 2. Note the pebble numbering that differs from that in [13]. In the above deﬁnition we adopt the pebble numbering from [5] in which the pebbles placed on the input word are numbered from k to i and not from 1 to i as in [13]. The reason for this reverse numbering is that it allows us to view the computation between placing and lifting pebble i as a computation of an

(

i

−

¹

)

-pebble automaton.

Furthermore, the automaton is no longer equipped with the ability to compare positional equality, in contrast with the ordinary PA introduced in [13]. Such ability no longer makes any difference because of the “weak” manner in which the new pebbles are placed.

The acceptance criteria are based on the notion of leads to acceptance below. For every conﬁguration

γ = [

ⁱ

,

q

, θ]

^,

•

^{if q}

∈

^{F , then}

γ

leads to acceptance;

•

^{if q}

∈

^{U , then}

γ

leads to acceptance if and only if for all conﬁgurations

γ

such that

γ γ

,

γ

leads to acceptance;

•

^{if q}

∈ /

^F

∪

^{U , then}

γ

leads to acceptance if and only if there is at least one conﬁguration

γ

such that

γ γ

and

γ

leads to acceptance.

A

Σ

-data word w

∈ (Σ × D)

^∗ is accepted by

A

^{, if}

γ

0 leads to acceptance. The language L

( A)

consists of all data words accepted by

A

.

The automaton

A

is non-deterministic, if the set U

= ∅

, and it is deterministic, if there is exactly one transition that applies for each conﬁguration. It turns out that weak PA languages are quite robust.

Theorem 3. For all k

1, alternating, non-deterministic and deterministic weak k-PA have the same recognition power.

The proof is quite standard. We will sketch the proof for the case of k

=

2 in the next section. The extension to the general case is straightforward, thus, omitted. In view of Theorem 3, we will always assume that our weak k-PA is deter- ministic.

We end this subsection with an example of a language accepted by weak 2-PA. This example will be useful in the subsequent section.

Example 4. Consider a

Σ

-data language L_∼ deﬁned as follows. A

Σ

-data word w

=

_σ₁

a1

· · ·

_σ_n

an

∈

^L∼ if and only if for all i

,

j

=

¹

, . . . ,

n, if a_i

=

^aj, then

σ

i

= σ

j. That is, w

∈

^L∼if and only if whenever two positions in w carry the same data value, their labels are the same.

The language L_∼is accepted by weak 2-PA which works in the following manner. Pebble 2 iterates through all possible positions in w. At each iteration, the automaton stores the label seen by pebble 2 in the state and places pebble 1. Then, pebble 1 scans through all the positions to the right of pebble 2, checking whether there is a position with the same data value as pebble 2. If there is such a position, then the label seen by pebble 1 must be the same as the label seen by pebble 2, which has previously been stored in the state. After that, pebble 1 is lifted, and the control returns to pebble 2 and the iteration continues.

2.2. Register automata

We are only going to sketch roughly the deﬁnition of register automata. Readers interested in its more formal treatment can consult [6,9]. In essence, k register automaton, or, shortly k-RA, is a ﬁnite state automaton equipped with a header to scan the input and k registers, numbered from 1 to k. Each register can store exactly one data value from

D

. The automaton is two-way if the header can move to the left or to the right. It is alternating if it is allowed to branch into a ﬁnite number of parallel computations.

More formally, a two-way alternating k-RA over the label

Σ

is a tuple

A = Σ,

^Q

,

q0

,

u0

, μ ,

F

^where

•

^Q0, q₀

∈

^{Q and F}

⊆

Q are the ﬁnite state of states, the initial state and the set of ﬁnal states, respectively.

•

^u0

=

^a1

· · ·

^akis the initial content of the registers.

• μ

is a set of transitions of the following form.

(i)

(

q

, σ ) →

^q^where

σ _{∈ {, }}

and q

,

q

∈

^{Q .}

That is, if the automaton

A

is in state q and the header is currently reading either of the symbols

,

^{, then the} automaton can enter the state q.

(ii)

(

q

, σ ,

V

) →

^q^where

σ ∈ Σ

^{, V}

⊆ {

¹

, . . . ,

k

}

^{and q}

,

q

∈

^{Q .}

A

is in state q and the header is currently reading a position labeled with

σ

and V is the set of all registers containing the current data value, then the automaton can enter the state q.

(iii) q

→ (

^q

,

I

)

where I

⊆ {

¹

, . . . ,

k

}

^{and q}

,

q

∈

^{Q .}

A

is in state q, then the automaton can enter the state qand store the current data value into the registers whose indices belong to I.

(5)

(iv) q

→ (

^q1

∧ · · · ∧

^qi

)

and q

→ (

^q1

∨ · · · ∨

^qi

)

where i

^{1 and q}

,

q

∈

^{Q .}

A

is in state q, then it can decide to perform conjunctive or disjunctive branching into the states q1

, . . . ,

q_i.

(v) q

→ (

^q

, act)

where

act ∈ {left, right}

^{and q}

,

q

∈

^{Q .}

A

is in state q, then it can enter the state qand move to the next or the previous word position.

A register automaton is called non-deterministic if the branchings of state (in item (iv)) are all disjunctive. It is called one-way if the header is not allowed to move to the previous word position.

A conﬁguration

γ _{= [}

j

,

q

,

b1

· · ·

^bk

]

of the automaton

A

consists of the current position of the header in the input word j, the state of the automaton q and the content of the registers b1

· · ·

^bk. The conﬁguration

γ

is called accepting if the state is a ﬁnal state in F .

From each conﬁguration

γ

, the automaton performs legitimate computation according to the transition relation and enters another conﬁguration

γ

. If the transition is branching, then it can split into several conﬁgurations

γ

₁

, . . . , γ

_m.

Similarly, we can deﬁne the notion of leads to acceptance for a conﬁguration

γ

as in previous subsection. An input word w is accepted by

A

if the initial conﬁguration leads to acceptance. As usual, L

( A)

denotes the language accepted by

A

^. 3. Equivalence between alternating and deterministic one-way weak 2-PA

For every one-way alternating weak 2-PA, we will construct its equivalent one-way deterministic weak 2-PA. This is done in two steps.

1. First, we transform the one-way alternating weak 2-PA into its equivalent one-way non-deterministic weak 2-PA.

2. Then, we transform the one-way non-deterministic weak 2-PA into its equivalent one-way deterministic weak 2-PA.

We present step 2 ﬁrst.

3.1. From non-deterministic to deterministic

Let

A =

^Q

,

q₀

,

F

, μ

be a non-deterministic weak 2-PA. We start by normalizing the behavior of

A

as follows.

N1. For every conﬁguration

γ

of

A

, there exists a transition in

μ

that applies to it.

N2. There is no

stay

transition in

A

. On each transition the automaton

A

moves the head pebble to the right, lifts the current head pebble, or places a new pebble.

N3. The automaton can only enter a ﬁnal state when the control is in pebble 2. Furthermore, it does so only after it reads the right-end marker

^.

N4. Immediately after pebble 2 moves right, pebble 1 is placed.

N5. Pebble 1 is lifted only when it reaches the right-end marker

^.

Such normalization can be done by adding some extra states to

A

(or, extra transitions in the case of N2). The normalization N5 is especially crucial, as it implies that non-deterministism on pebble 1 is now limited only to deciding which state to enter. There is no non-deterministism in choosing which action to take, i.e. either to lift pebble 1 or to keep on moving right.

Next, we note that immediately after pebble 1 is lifted, there can be two choices of actions for pebble 2:

– to place pebble 1 again; or – to move pebble 2 to the right.

The following sixth normalization handles this situation:

N6. Immediately after pebble 1 is lifted, pebble 2 moves right.

In other words, while pebble 2 is reading a speciﬁc position, pebble 1 makes exactly one pass, from the position of pebble 2 to the right end of the input, instead of making several rounds of passes by placing pebble 1 again immediately after it is lifted.

Since there are only ﬁnitely many states, there can only be ﬁnitely many passes. So, the normalization N6 can be achieved by simultaneously simulating all possible passes in one pass.

With the normalizations N1–N6, there is no non-deterministism in choosing which action to take for pebble 2. Similar to pebble 1, non-deterministism of pebble 2 is now limited only in deciding which states to take. This is summed up in the following remark.

(6)

Remark 5. For each i

=

¹

,

2, if

(

i

,

P

,

V

,

p

) → (

^q1

, act

1

)

and

(

i

,

P

,

V

,

p

) → (

^q2

, act

2

)

, then

act

₁

= act

2.

Now that the non-deterministism is reduced to deciding which state to enter, the determinization of

A

becomes straightforward. Similar to the classical proof of the equivalence between non-deterministic and deterministic ﬁnite state automata, we can take the power set of the states of

A

to deterministically simulate

A

^.

Remark 6. We note that the normalization steps N1–N5 can be performed similarly for weak k-PA

A

, for every k

=

¹

,

2

, . . .

. The determinization of non-deterministic weak k-PA can then be done in a similar manner.

3.2. From alternating to non-deterministic

Let

A = Σ,

^Q

,

q0

, μ ,

F

,

U

be a one-way alternating weak 2-PA. Adding some extra states, we can normalize

A

^as follows.

A1. For every p

∈

^{U , if}

(

i

, σ ,

V

,

p

) → (

^q

, act) ∈ μ

, then

act = stay

^. A2. Every pebble can be lifted only after it reads the right-end marker

^.

A3. Only pebble 2 can enter a ﬁnal state and it does so only after it reads the right-end marker

^.

We assume that Q is partitioned into Q1

∪

^Q2where Qiis the set of states, where pebble i is the head pebble, for each i

=

¹

,

2. We can further partition each Qi into four sets of states: Qi,stay, Qi,right, Qi,place, Qi,liftsuch that for every i

=

¹

,

2,

σ ∈ Σ

^{, V}

⊆ {

¹

,

2

}

^{, and q}

,

p

∈

^{Q ,}

A4. If q

∈

^Qi,stayand

(

i

, σ ,

V

,

q

) → (

^p

, act) ∈ μ

, then

act = stay

^. A5. If q

∈

^Qi,rightand

(

i

, σ ,

V

,

q

) → (

^p

, act) ∈ μ

, then

act = right

^.

A6. If q

∈

^Qi,placeand

(

i

, σ ,

V

,

q

) → (

^p

, act) ∈ μ

, then

act = place

^-

pebble

. A7. If q

∈

^Qi,liftand

(

i

, σ ,

V

,

q

) → (

^p

, act) ∈ μ

, then

act = lift

^-

pebble

.

The intuitive meaning of these states are clear. We partition the states of

A

according to all possible actions of

A

. Especially, by restricting the set U of universal states to be inside Q1,stay

∪

^Q2,stay, the non-determinization process of

A

^{will be} much easier.

The non-determinization process itself is a pretty straightforward simulation of all possible computation paths of

A

^{. On} an input w

=

_σ₁

a1

· · ·

_σ_n

an

, due to universal branching, the automaton

A

can be in several states when it reaches a certain position i, where 1

ⁱ

n. Since the number of states is ﬁnite, to simulate the run of

A

on w, it is then suﬃcient to remember all these states and simulates all possible transitions from these states.

Formally, we deﬁne the non-deterministic weak 2-PA

A

= Σ,

^Q

,

q₀

, μ

,

F

^{, where}

•

^Q

=

²^Q²

∪ (

²^Q²

×

²^Q¹

)

;

•

^q₀

= {

^q0

}

^;

•

^F

=

²^F

− {∅}

^.

The set

μ

contains the following transitions.

•

The transitions, when pebble 2 is the head, are as follows. For every S

⊆

^Q2and for every

σ ∈ Σ

, we have the following transitions.

– If S contains a state q

∈

^{U , then}

(

2

, σ , ∅,

^S

) →

S

− {

^q

}

∪

^Uq

, stay

∈ μ

where U_q

= {

^p

| (

²

, σ , ∅,

^q

) → (

^p

, stay) ∈ μ }

^. – If S contains a state q

∈

^Q2,stayand S

∩

^U

= ∅

^{, then}

(

2

, σ ,

V

,

S

) →

S

− {

^q

}

∪

^Nq

, stay

∈ μ

for every Nq

⊆ {

^p

| (

ⁱ

, σ , ∅,

^q

) → (

^p

, stay) ∈ μ _}

and Nq

= ∅

^. – If S contains a state q

∈

^Q2,placeand S

∩

^Q2,stay

= ∅

^{, then}

(

2

, σ , ∅,

S

) →

S

− {

q

}, {

p

}

, place

-

pebble

∈ μ

where

(

2

, σ , ∅,

^q

) → (

^p

, place

-

pebble) ∈ μ

. – If S

⊆

^Q2,right, then

(

2

, σ , ∅,

^S

) →

S

, right

∈ μ

where S

= {

^p

| (

²

, σ , ∅,

^q

) → (

^p

, right) ∈ μ

and q

∈

^S

}

^.

(7)

•

The transitions, when pebble 1 is the head, are as follows. For every S2

⊆

^Q2, S1

⊆

^Q1, V

⊆ {

²

}

and for every

σ _{∈ Σ}

, we have the following transitions.

– If S1 contains a state q

∈

^{U , then}

1

, σ ,

V

, (

S2

,

S1

)

→

S2

,

S1

− {

^q

}

∪

^Uq

, stay

∈ μ

where Uq

= {

^p

| (

¹

, σ ,

V

,

q

) → (

^p

, stay) ∈ μ _}

. – If S₁ contains a state q

∈

^Q1,stayand S

∩

^U

= ∅

^{, then}

1

, σ ,

V

, (

S₂

,

S₁

)

→

S₁

,

S₁

− {

^q

}

∪

^Nq

, stay

∈ μ

for every Nq

⊆ {

^p

| (

¹

, σ ,

V

,

q

) → (

^p

, stay) ∈ μ }

^{and N}q

= ∅

^. – If S1

⊆

^Q1,right, then

1

, σ ,

V

, (

S1

,

S1

)

→

S1

,

S₁

, right

∈ μ

where S₁

= {

^p

| (

¹

, σ ,

V

,

q

) → (

^p

, right) ∈ μ

and q

∈

^S

∩

^Q1

}

^. – If S1

⊆

^Q1,lift, then

1

, ,

^V

, (

S₂

,

S₁

)

→ (

^S2

∪

^R

, lift

-

pebble ) ∈ μ

where R

= {

^p

| (

¹

, σ ,

V

,

q

) → (

^p

, lift

-

pebble) ∈ μ

and q

∈

^S1

}

^. The proof that L

(A) =

^L

(A

)

is pretty straightforward, thus, omitted.

Remark 7. We note that the normalizations A1–A7 can be performed similarly on alternating weak k-PA, for every k

=

1

,

2

, . . .

. Converting an alternating weak k-PA

A

to the non-deterministic version

A

can be done as follows. The set of states of

A

for general k is the set

2^Q^k

∪

2^Q^k

×

²^Q^k⁻¹

∪ · · · ∪

2^Q^k

×

²^Q^k⁻¹

× · · · ×

²^Q¹

The initial state and the set of ﬁnal states are the same

{

^q0

}

^{and 2}^F

− {∅}

, respectively. The set of transitions can be deﬁned in a similar manner as above.

4. Decidability and undecidability of weak PA

In this section we will discuss the decidability issue of weak PA. We show that the emptiness problem for weak 3-PA is undecidable, while the same problem for weak 2-PA is decidable. The proof of the decidability of the emptiness problem for weak 2-PA will be the basis of the proof of the decidability of the same problem for top view weak PA.

Theorem 8 is similar to the proof of the undecidability of the emptiness problem for weak 5-PA in [13]. The main technical step in the proof of the undecidability of the emptiness problem of weak 3-PA is to show that the following

Σ

-data language

Lord

=

σ

a1

· · ·

σ

an

$ d

σ

a1

· · ·

σ

an

: a1

, . . . ,

anare pairwise different

is accepted by weak 3-PA, where

Σ = { σ ,

$

}

. This is similar to the proof of the same problem for weak 5-PA. Our main observation is that weak 3-PA are suﬃcient to accept the language Lord. From this step, the undecidability can be easily obtained via a reduction from the Post Correspondence Problem (PCP). The formal details are presented in the following theorem.

Theorem 8. The emptiness problem for weak 3-PA is undecidable.

Proof. As mentioned above, the proof uses a reduction from the Post Correspondence Problem (PCP), which is well known to be undecidable [8].

An instance of PCP is a sequence of pairs

(

x1

,

y1

), . . . , (

xn

,

yn

)

, where each x1

,

y1

, . . . ,

xn

,

yn

∈ { α , β }

^∗. This instance has a solution if there exist indexes i1

, . . . ,

im

∈ {

¹

, . . . ,

n

}

such that xi1

· · ·

^xim

=

^yi1

· · ·

^yim. The PCP asks whether a given instance of the problem has a solution.

In the following we show how to encode a solution of an instance of PCP into a data word which possesses properties that can be checked by a weak 3-PA. Let

Σ = {

¹

, . . . ,

n

, α , β,

$

}

^{and x}i

= ν

i,1

· · · ν

i,li, for each i

=

¹

, . . . ,

n. Each string x_i is encoded as Enc

(

x_i

) =

_ν_i,1

ai,1

· · ·

_ν

ai,li_i,li

where a_i_,₁

, . . . ,

a_i_,_l_i are pairwise different.

The string xi1xi2

· · ·

^xim can be encoded as Enc

(

x_i₁

,

x_i₂

, . . . ,

x_i_m

) =

i₁ b1

Enc

(

x_i₁

)

i₂ b2

Enc

(

x_i₂

) · · ·

i_m bm

Enc

(

x_i_m

)

(8)

where all the data values that appear in it are pairwise different. Note that even if ij

=

ⁱj for some j

,

j, the data values that appear in Enc

(

x_i_j

)

do not appear in Enc

(

x_i_j

)

and vice versa. The idea is that each data value is used to mark a place in the string.

Similarly, the string y_j₁y_j₂

· · ·

^yjl can be encoded as Enc

(

y_j₁

,

y_j₂

, . . . ,

y_j_l

) =

j₁ c1

Enc

(

y_j₁

)

j₂ c2

Enc

(

y_j₂

) · · ·

j_l c_l

Enc

(

y_j_l

)

where the data values that appear in it are pairwise different.

The data word

i₁ b₁

Enc

(

x_i₁

) · · ·

i_m b_m

Enc

(

x_i_m

)

$ d

j₁ c₁

Enc

(

y_j₁

) · · ·

j_l c_l

Enc

(

y_j_l

)

constitutes a solution to the instance of PCP if and only if

i₁i₂

· · ·

ⁱm

=

^j1j₂

· · ·

^jl (1)

Proj_Σ

Enc

(

xi₁

) · · ·

Enc

(

xim

)

=

Proj_Σ

Enc

(

yj₁

) · · ·

Enc

(

yj_l

)

(2) Now, in order to be able to check such property with weak 3-PA, we demand the following additional criteria.

1. b1

· · ·

^bm

=

^c1

· · ·

^cl.

2. Proj_D

(

Enc

(

x_i₁

) · · ·

^Enc

(

x_i_m

)) =

^ProjD

(

Enc

(

y_j₁

) · · ·

^Enc

(

y_j_l

)).

3. For any two positions h1 and h2 where h1 is to the left of the delimiter

_$

d

and h2 is to the right of the delimiter

_$

d

, if both of them have the same data value, then both of them are labeled with the same label.

All the criteria 1–3 imply Eqs. (1) and (2). Also note that criteria 1–2 resemble the language L_ord.

Because the data values that appear in Proj_D

(

Enc

(

xi₁

), . . . ,

Enc

(

xim

))

are pairwise different, all of them are checkable by three pebbles in the “weak” manner. For example, to check criterion 1, the automaton does the following.

•

Check that b1

=

^c1.

•

Check that for each i

=

¹

, . . . ,

m

−

1, there exists j such that aiai+¹

=

^bjbj+¹.

It can be done by placing pebble 3 to read a_i and pebble 2 to read a_i₊₁, then using pebble 1 to search on the other side of $ for the index j.

•

Finally, check that bm

=

^cl.

Criterion 2 can be checked similarly and criterion 3 is straightforward. The reduction is now complete and we prove that the emptiness problem for weak 3-PA is undecidable.

Now we are going to show that the emptiness problem for weak 2-PA is decidable. The proof is by simulating weak 2-PA by one-way alternating one register automata (1-RA). See [6] for the deﬁnition of alternating RA.

In fact, the simulation can be easily generalized to arbitrary number of pebbles. That is, weak k-PA can be simulated by one-way alternating

(

k

−

¹

)

-RA. (Here

(

k

−

¹

)

-RA stands for

(

k

−

¹

)

-register automata.) This result settles a question left open in [13]: Can weak PA be simulated by alternating RA?

Theorem 9. For every weak k-PA

A

, there exists a one-way alternating

(

k

−

¹

)

-RA

A

such that L

( A) =

^L

( A

)

. Moreover, the con- struction of

A

^from

A

is effective.

Now, by Theorem 9, we immediately obtain the decidability of the emptiness problem for weak 2-PA because the same problem for one-way alternating 1-RA is decidable [6, Theorem 4.4].

Corollary 10. The emptiness problem for weak 2-PA is decidable.

We devote the rest of this section to the proof of Theorem 9 for k

=

2. Its generalization to arbitrary k

3 is pretty straightforward.

Recall that we may always assume that weak PA are deterministic. Let

A = Σ,

Q

,

q0

, μ ,

F

be a weak 2-PA. As in Section 3.1, we normalize the behavior of

A

as follows.

•

Pebble 1 is lifted only after it reads the right-end marker

^.

•

The automaton can only enter a ﬁnal state when the control is in pebble 2. Furthermore, it does so only after pebble 2 reads the right-end marker

^.

(9)

q0

q1

_σ₁

d₁

q2

qn

σn

dn

qn+1

qf

p1,1

_σ₁

d₁

_p₁

,2

_p

1,n

_σ_n

dn

_p₁_,_n

₊₁

_p₁

_p

_n_,_n

_σ_n

dn

_p_n

,n+¹

_p_n

Fig. 1. The tree representation of a run ofAon the data word w=_σ₁

d₁

· · ·_σ_n

d_n

.

•

Immediately after pebble 2 moves right, pebble 1 is placed.

•

Immediately after pebble 1 is lifted, pebble 2 moves right.

On input word w

=

_σ₁

d1

· · ·

_σ_n

dn

, the run of

A

on

^w

can be depicted as a tree shown in Fig. 1.

The meaning of the tree is as follows.

•

^q0

,

q₁

, . . . ,

q_n

,

q_n₊₁are the states of

A

when pebble 2 is the head pebble reading the positions 0

,

1

, . . . ,

n

,

n

+

^{1, respec-} tively, that is, the symbols

,

_σ₁

d1

, . . . ,

_σ_n

dn

,

, respectively.

•

^qf is the state of

A

after pebble 2 reads the symbol

^.

•

^{For 1}

ⁱ

^j

^{n, p}i,j is the state of

A

when pebble 1 is the head pebble reading the position j, while pebble 2 is above the position i.

•

^{For 1}

ⁱ

n, the state pi is the state of

A

immediately after pebble 1 is lifted and pebble 2 is above the position i.

It must be noted that there is a transition

(

2

, σ

i

, ∅,

^pi

) → (

^qi+1

, right)

applied by

A

that is not depicted in the ﬁgure.

Now the simulation of

A

by a one-way alternating 1-RA

A

becomes straightforward if we view the tree in Fig. 1 as a tree depicting the computation of

A

on the same word w. Fig. 2 shows the corresponding run of

A

on the same word.

Roughly, the automaton

A

is deﬁned as follows.

•

The states of

A

are elements of Q

∪ (

^Q

×

^Q

)

;

•

the initial state is q₀; and

•

the set of ﬁnal states is F

∪ {(

^p

,

p

)

: p

∈

^Q

}

^.

For each placement of pebble 1 on position i, the automaton performs the following “Guess–Split–Verify” procedure which consists of the following steps.

1. From the state qi, the automaton

A

“guesses” (by disjunctive branching) the state in which pebble 1 is eventually lifted, i.e. the state p_i. It stores p_iin its internal state and simulates the transition

(

2

, σ

i

, ∅, ∅,

^qi

) → (

^pi,i

, place

-

pebble) ∈ μ

to enter into the state

(

p_i_,_i

,

p_i

)

.

2. The automaton

A

“splits” its computation (by conjunctive branching) into two branches.

•

In one branch, assuming that the guess pi is correct,

A

moves right and enters into the state qi+¹, simulating the transition

(

2

, ∅,

^pi

) → (

^qi+1

, right)

.

After this, it recursively performs the Guess–Split–Verify procedure for the next placement of pebble 1 on position

(

i

+

¹

)

.

•

In the other branch

A

stores the data value d_i in its register and simulates the run of pebble 1 on

_σ_i

di

· · ·

_σ_n

d_n

, starting from the state pi,i, to “verify” that the guess piis correct.

Journal of Computer and System Sciences