• 沒有找到結果。

(FSDA)wasintroducedin[16].Theseautomatawereintendedfortheabstractstudyofrelationallan-guages.Sincethecharacterofrelationallanguagesrequirestheuseofinfinitealphabetsofnamesofvariables,inadditiontoafinitesetofstates,FSDAareequippedwithafinitesetof“registers”ca

N/A
N/A
Protected

Academic year: 2022

Share "(FSDA)wasintroducedin[16].Theseautomatawereintendedfortheabstractstudyofrelationallan-guages.Sincethecharacterofrelationallanguagesrequirestheuseofinfinitealphabetsofnamesofvariables,inadditiontoafinitesetofstates,FSDAareequippedwithafinitesetof“registers”ca"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

Regular Expressions for Languages over Infinite Alphabets

Michael Kaminski

Department of Computer Science Technion – Israel Institute of Technology Haifa 32000, Israel

kaminski@cs.technion.ac.il

Tony Tan

Department of Computer Science National University of Singapore 3 Science Drive 2

Singapore 117543

Abstract. In this paper we introduce a notion of a regular expression over infinite alphabets and show that a language is definable by an infinite alphabet regular expression if and only if it is accepted by finite-state unification based automaton – a model of computation that is tightly related to other models of automata over infinite alphabets.

Keywords: Finite state automata, infinite alphabets, regular expressions

1. Introduction

A new model of finite-state automata dealing with infinite alphabets, called finite-state datalog automata (FSDA) was introduced in [16]. These automata were intended for the abstract study of relational lan- guages. Since the character of relational languages requires the use of infinite alphabets of names of variables, in addition to a finite set of states, FSDA are equipped with a finite set of “registers” capable of retaining a variable name (out of an infinite set of names). The equality test, which is performed in



Address for correspondence: Department of Computer Science, Technion – Israel Institute of Technology, Haifa 32000, Israel

(2)

ordinary finite-state automata (FA) was replaced with unification, which is a crucial element of relational languages.

Later, FSDA were extended in [7] to a more general model dealing with infinite alphabets, called finite-memory automata (FMA). FMA were designed to accept the infinite alphabet counterpart of the ordinary regular languages. Similarly to FSDA, FMA are equipped with a finite set of registers which are either empty or contain a symbol from the infinite alphabet, but contrary to FSDA, registers in FMA cannot contain symbols currently stored in other registers. By restricting the power of the automaton to copying a symbol to a register and comparing the content of a register with an input symbol only, without the ability to perform any functions, the automaton is only able to “remember” a finite set of input symbols. Thus, the languages accepted by FMA possess many of the properties of regular languages.

Whereas decision of the emptiness and containment for FMA- (and, consequently, for FSDA-) lan- guages is relatively simple, the problem of inclusion for FMA-languages is undecidable, see [11, 12].

An extension of FSDA to a general infinite alphabet called finite-state unification based automata, (FSUBA) was proposed in [17]. These automata are similar in many ways to FMA, but are a bit weaker, because a register of FSUBA may contain a symbol currently stored in other registers. It was shown in [17] that FSDA can be simulated by FSUBA and that the problem of inclusion for FSUBA languages is decidable.

While the study of finite automata over infinite alphabets started as purely theoretical, since the appearance of [7] and [8] it seems to have turned to more practically oriented. The key idea for the applicability is finding practical interpretations to the infinite alphabet and to the languages over it.

In [11, 12], members of (the infinite) alphabet



are interpreted as records of communication ac- tions, “send” and “receive” of messages during inter-process-communication. Words in a language



over this alphabet are MSCs, message sequence charts, capturing behaviors of the communica- tion network.

In [4], members ofare interpreted as URLs’ addresses of internet sites, a word inis interpreted as a “navigation path” in the internet, the result of some finite sequence of clicks.

In [5] there is another internet-oriented interpretation of , namely, XML mark-ups of pages in a site.

In this paper we introduce a notion of a regular expression for languages over infinite alphabets and show that a language is definable by an infinite alphabet regular expression if and only if it is accepted by an FSUBA.

The paper is organized as follows. In the next section we recall the definition of FSDA from [16] and in Section 3 we recall the definition of FSUBA from [17]. In Section 4 we present the main result of our paper – unification based regular expressions for languages over infinite alphabets, whose equivalence to FSUBA is proven in Sections 5 and 6. Section 7 contains the proof of a modification of a technical lemma from [17]. Finally, Section 8 deals with the complexity of intertranslations between FSUBA and unification based regular expressions.

2. Finite-state datalog automata

We start with examples of relational languages which can and cannot be defined by finite-state datalog automata (FSDA). Relational languages are languages over infinite alphabets whose symbols are of the

(3)

form  , where belongs to a finite alphabet of binary relation symbols andand come from an infinite alphabet of variables. For example, the relational language







  



   



   



 

 

  



generated by the Horn grammar





 



 









 



 





is definable by finite-state datalog automata, see [16, Section 2], whereas the relational language





 



  























  



  



generated by the Horn grammar





























is not, because the restrictions of FSDA-languages to finite alphabets are regular, see [8, Proposition 1].

Next we recall the definition of FSDA from [16].

A finite-state datalog automaton or, shortly, FSDA, is a system



 !"#

, where



and  are a finite alphabet of binary relation symbols and an infinite alphabet of variables, respectively,  $  %and & (' ,1 whereas the input alphabet of is )



. That is, an input symbol is a relation  , where (is a binary relation symbol and  ( are variables.

,  ( , and ! * are a finite set of states, the initial state, and the set of final states, respectively.

"

is the number of registers of, which are capable of either being empty or retaining a variable from.

# * 

) )



+++"



)



+++"



)

 ,

--...-

/0 )

is the transition relation whose elements are called transitions. The intuitive meaning of the transition relation is as follows. If the automaton is in statereading relation  and there is a transition  1123 (# such that the register 14either contains4or is empty,5, then the automaton can enter state

3, copy4into the14th register, if the latter is empty, and empty (reset) the registers whose indices belong to 2. The above registers1and1are referred to as the transition registers.

An actual state of an FSDA  is an element of together with the contents of all registers of the automaton. Thus, has infinitely many states2 which are pairs 6, where  ( and 6 (



7



&



 /

. Such pairs are called configurations ofand are denoted 

8

. The pair  &

/

, denoted

 8

, is the initial configuration, and the configurations with the first component in ! are called final configurations. The set of final configurations is denoted!8.

The transition relation#induces the following relation#

8

on

8) )

 )

 )

 8

. Let3 ( and 6 /63 333/ ( 7



&



 /

. Then 6  36 3(#8if and only if there is a transition  1123(#such that the following four conditions are satisfied.

1In this paper we reserve9to denote an empty register.

2This is the major difference between ordinary finite-state automata and finite-state automata over infinite alphabets.

(4)

1.   (





4



&



,5. That is, the transition register 14either contains4or is empty.

2. If14 (' 2, then3 4,5. That is, if the transition register14is not reset in the transition, its content is4.

3. For all (2,3 &. 4. For all (' 27



11



,3 .

Let            be a word over

 )





. A run of the automaton

onconsists of a sequence of configurations +++ such that is the initial configuration8, and 4 4 4 44(#8,5+++.

We say that acceptsif there exists a run  +++ ofon such that  ( !

8

. The set of all words accepted byis denoted byand is referred to as an FSDA-language. We refer the reader to [16] for additional examples of FSDA-languages and their relation to DATALOG.

3. Finite-state unification based automata

Till the end of this paper



is an infinite alphabet not containing &. For a word 6  /over

7



&



, we define the content of6, denoted 6 , by 6 





'&





+++"



. That is, 6

consists of all symbols ofwhich appear in 6.

Definition 3.1. ([17]) A finite-state unification based automaton (over ) or, shortly, FSUBA, is a sys- tem



 ! #

, where

,  ( , and ! * are a finite set of states, the initial state, and the set of final states, respectively.









/

(



7

&



 /

," , is the initial assignment – register initialization: the symbol in the5th register is4. Recall that &is reserved to denote an empty register. That is, if &, then theth register is empty.

*



is the “read only” alphabet whose symbols cannot be copied into empty registers.3 One may think of as a set of the language constants which cannot be unified, cf [16].

# * )



+++"



)

 ,

--...-

/0 )

is the transition relation whose elements are called transi- tions. The intuitive meaning of#is as follows. If the automaton is in statereading symbol and there is a transition 123 (#such that the1th register either contains or is empty, then the automaton can enter state 3, write in the 1th register (if it is empty), and erase the content of the registers whose indices belong to 2. The1th register will be referred to as the transition register.

Like in the case of FSDA, an actual state of is an element of together with the contents of all registers of the automaton. That is, has infinitely many states which are pairs 6, where  (  and 6 (7



&



 /

. These pairs are called configurations of. The set of all configurations ofis

3Of course, we could let be any subset of . However, since the elements of cannot be copied into empty registers, the automaton can make a move with the input from only if the symbol already appears in one of the automaton registers, i.e., belongs to the initial assignment.

(5)

denoted 8. The pair  , denoted8, is called the initial configuration,4and the configurations with the first component in! are called final configurations. The set of final configurations is denoted!8.

Transition relation#induces the following relation#

8

on 

8

))

 8

.

Let3 (,6 /and63 333/. Then the triple 6 363belongs to#8if and only if there is a transition 123in#such that the following conditions are satisfied.

Either &(i.e., the transition register is empty in which case is copied into it) and (' , or  (i.e., the transition register contains ).

If1 (' 2, then3  , i.e., if the transition register is not reset in the transition, its content is . For all (2,3 &.

For all (' 27



1



,3 . Let    be a word over



. A run of on consists of a sequence of configurations







+++



such that is the initial configuration8and 4 44 (#8,5+++.

We say thataccepts , if there exists a run +++ ofon such that (!8. The set of all words accepted byis denoted by





and is referred to as an FSUBA-language.

Example 3.1. Let   





















&

/











+++







#

be an "-register FSUBA, where#consists of the only one transition 1%. Alternatively,can be described by the following diagram.







1



%







&

&









initialization

Obviously, , if1 ", and



/



, otherwise.

Example 3.2. ([17]) Let















,&%#be a one-register FSUBA, where#consists of the following two transitions:





%

 





%



, see the diagram below.











%









%







&

initialization

4Recall that anddenote the initial state and the initial assignment, respectively.

(6)

Then





 (













: an accepting run ofon is&  . In contrast, the language  





 (







'





is not an FSUBA language.5 To prove that, assume to the contrary that for some FSUBA ! #,. Since is infinite and is finite,





contains two different symbols and . By the definition of



, it contains the word  . Let  66, 64 4-4-4-/, 5  , be an accepting run of  on  and let1be the transition register between configurations 6and 6. Since neither of  and  belongs to and  ' , -  &and -  . Then, replacing- with  in



 



6



6

we obtain an accepting run ofon  , which contradicts .6 The following example shows how FSDA can be simulated by FSUBA.

Example 3.3. ([17]) Let    !"#,  







+++





, be an FSDA. Consider an FSUBA3 



33!3 #3

, such that





 7

,

3



 7



# )



" "



,

3



 ,

!3



!,



&

/









,





, and

#consists of all transitions of the form 1, 2, or 3 below 1. "%1123 ", 2. 





1123 "1

%











1123 "

, or 3. 





1123 "123

, where 





1123 (

#. That is, we break each transition







3

2







 11

of#into three “consecutive” transitions

5It can be readily seen thatis accepted by a finite-memory automaton introduced in [7].

6The decision procedure for the inclusion of FSUBA-languages in [17] is based on a refined version of this argument.

(7)





%

"













1123 "

1



%













1123 "

2 1



3

of#3.

A straightforward induction on the word length shows that





  



  



 



 

 (







if and only if

    

 



 

(





3

+

Example 3.4. Let 

















&&



%

#

be a 2-register FSUBA, where#consists of the following three transitions:





%

 

,











 

,





%



, see the diagram below.











%



















%







& &

initialization It can be easily seen that







(









. Example 3.5. ([17], cf [8, Example 1].) Let   

















&&



%

#

be an FSUBA with two registers and#consists of the following five transitions:

(8)













,





%

 

,











 

,





%



, and













, see the diagram below.





















%



















%

















& &

initialization

It can be easily seen that















(



there exist 5 53  such that 4  4



+

That is, consists of all words over in which some symbol appears twice or more. For example, an accepting run ofon is





&&







&&







&







&







&







&

+

Example 3.6. ([17]) Let   



 ! #

be an FSUBA such that &does not appear in  and for all 123 (#, 2  %. Thenis a regular language over . In general, since the restriction of a set of configurations to a finite alphabet is finite, the restrictions of FSUBA-languages to finite alphabets are regular, cf. [8, Proposition 1].

4. Regular expressions for FSUBA languages

In this section we introduce an alternative description of FSUBA languages by the so called unification based expressions which are the infinite alphabet counterpart of the ordinary regular expressions.

Definition 4.1. Let 



+++

/



be a set of variables such that $ %and let be a finite subset of. Unification based regular expressions over  , or shortly UB-expressions, if is understood from the context, are defined as follows.

%



, and each element of 7 are UB-expressions.

Ifandare UB-expressions, then so is .

(9)

If3 * andand are UB-expressions, then so are   and 





.

The intuition behind the above definition is as follows. Each variable in corresponds to a register of the automaton and a “variable” assignment of symbols from  to variables in is the register assignment. Finally, subscripts3indicate the set of registers reset by the automaton.

The definition of languages defined by UB-expressions is based on the observation that the set of all sequences of an "-register FSUBA diagram labels corresponding to its accepting runs is a regular language over



+++"



)

 ,

--...-

/0

. Thus, with a unification based regular expressions  over



 

we associate an ordinary regular expression over (finite) alphabet 7 7 , denoted, that is defined by induction as follows.

If(



%





7

 7

, thenis.









is .











is 3. Finally,  is3 .

Let6  (7 7  . With the5th symbol 4of6,5+++, we associate a word4 (

7







as described below, cf. [7, Definition 3].

If4 ( , then4 4. If4 3 *, then4 .

If4  (, then4satisfies the following (global) conditions.

– If for each53 5such that4 , there exists533,53 533 5, such that (4,7then4 can be any element of .

– Otherwise, let53be the maximal integer less than5such that4 and no symbol3 * that appears between the53th and the5th positions of6contains. Then4 4.

The word6   , where4is as defined above,5 +++, is called an instance of

6. The set of all instances of6is denoted by6.

Example 4.1. Let66 ( 7 7  . Then6%66 6.8

Next, for a language*7 7  , we denote bythe set of all instances of all elements of



. That is, 









 6.

Finally, for a UB-expressionwe define the language(over) as the set of all instances of the elements of:.9

7Of course, in such case,must be of the form



.

8Note that    





.

9Recall that



is a language over  



.

(10)

Example 4.2. It can be readily seen that and  behave like the ordinary concatenation and Kleene star, respectively. In addition, for a non-empty3,  is redundant, because    .10 Example 4.3. The language from Example 3.2 is   . Similarly, for a UB-expression   









over 









%

, consists of all words over having the same first and last symbols.

Thus,



























is the language from Example 3.5.

Example 4.4. Consider a subclass of UB-expressions, called FSDA-expressions, that is defined below.

%



, and UB-expressions of the form   , where ( and   ( are FSDA- expressions.

Ifandare FSDA-expressions, then so are ,  , and





.

It easily follows from Example 3.3 and the constructions in Sections 5 and 6 that FSDA languages are defined by FSDA-expressions and vice versa, each FSDA expression defines an FSDA language.

Theorem 4.1. A language is defined by a UB-expression if and only if it is accepted by an FSUBA.

The proof of the “if” part of Theorem 4.1 is based on a tight relationship between FSUBA and the ordinary finite automata. It is presented in the next section. The proof of the “only if” part of the theorem is based on the relevant closure properties of FSUBA-languages and is quite standard. For the sake of completeness, we present it in Section 6.

We conclude this section with one more closure property of FSUBA-languages that is an immediate corollary to Theorem 4.1.

Corollary 4.1. FSUBA languages are closed under reversing.11

Proof:

It can be easily verified that, for a UB-expression , 











, where 



is defined by the following induction.

If(



%





7

 7

, then



is.











is 













.













is 













.











is 





. 



Remark 4.1. Using an alternative equivalent model of computation that is similar to M-automata intro- duced in [8], one can show that FSUBA languages are also closed under intersection.12

10Of course,is redundant as well (but, still, very useful), because





 .

11It should be pointed out that FMA languages are not closed under reversing, see [8, Example 8]. Therefore, it is unlikely that there is a kind of regular expressions for FMA languages.

12It follows from Example 3.2 that FSUBA languages are not closed under complementation.

參考文獻

相關文件

[Hint: You may find the following fact useful.. If d is a metric for the topology of X, show that d|A × A is a metric for

In fact, the statement is true if p <

Remark: All the sequences are sequence of real numbers.. Formula that might be useful: Let θ

Any compact Hausdorff space is

(3) In Calculus, we have learned the following important properties for convergent sequences:.

(3%) (c) Given an example shows that (a) may be false if E has a zero divisors. Find the invariant factors of A and φ and their minimal polynomial. Apply

[r]

[r]