• 沒有找到結果。

Regular expressions for data words

N/A
N/A
Protected

Academic year: 2022

Share "Regular expressions for data words"

Copied!
20
0
0

加載中.... (立即查看全文)

全文

(1)

Contents lists available atScienceDirect

Journal of Computer and System Sciences

www.elsevier.com/locate/jcss

Regular expressions for data words

Leonid Libkin

a

, Tony Tan

b

, Domagoj Vrgoˇc

a

,

c

,∗

aUniversityofEdinburgh,UnitedKingdom

bHasseltUniversityandTransnationalUniversityofLimburg,Belgium cPUCChileandCenterforSemanticWebResearch,Chile

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received2May2014

Receivedinrevisedform 27October2014 Accepted3March2015

Availableonline24March2015

Keywords:

Datawords Registerautomata Regularexpressions

Inthis paper we define and study regular expressions for data words. We first define regularexpressions withmemory (REM), whichextend standard regular expressions with limitedmemoryand showthattheycapture theclassofdata wordsdefinedbyregister automata.WealsostudythecomplexityofthestandarddecisionproblemsforREM,which turn outto bethe same as for registerautomata. Inorder tolower the complexity of mainreasoningtasks,wethenlookattwonaturalsubclassesofREMthatcandefinemany propertiesof interest inthe applications of data words: regularexpressions withbinding (REWB)andregularexpressions withequality (REWE).Westudytheirexpressivepowerand analysethecomplexityoftheirstandarddecisionproblems.Wealsoestablishthefollowing stricthierarchyofexpressivepower:REMisstrictlystrongerthanREWB,andinturnREWB isstrictlystrongerthanREWE.

©2015ElsevierInc.All rights reserved.

1. Introduction

Data words are words that, in addition to a letter froma finite alphabet,have a datavalue from an infinite domain associatedwitheachposition.Forexample,



a

1



b

2



b

1



isadatawordoverthealphabet

 = {

a

,

b

}

and

N

asthedomainof values.Itcanbeviewedastheordinarywordabb inwhichthefirstandthethirdpositionsareequippedwithvalue1,and thesecondpositionwithvalue2.

They were introduced by Kaminski andFrancez in[14] who also proposed a naturalextension of finiteautomata for them, calledregisterautomata. They have become an active subject of research lately due to their applications in XML, in particular in staticanalysis oflogic and automata-based XML specifications,as well asin query evaluationtasks. For example,whenreasoningaboutpathsinXMLtrees,wemaywanttoreason,notonlyaboutthelabelsofthetrees,i.e.the XMLtags,butalsothevaluesofattributeswhichcancomefromaninfinitedomain.

While the applications of logic and automata models for the structural part of XML (without data values) are well understoodbynow[17,21,25],takingintoaccountthedatavaluespresentsentirelynewchallenges[23,26].Mosteffortsin thisdirectionconcentrateonfinding“wellbehaved”logics andtheirassociatedautomata[6,5,4,10],usually withthefocus on finding logics withdecidable satisfiability problem. In[23] Nevenet al. studiedthe expressive power ofvarious data word automata modelsin comparisonwithsome fragmentsof FOandMSO.A well-known resultofBojanczyk et al.[4]

showsthat FO2,thetwo-variablefragment offirst-orderlogicextended byequalitytest fordatavalues,isdecidableover

*

Correspondingauthor.

E-mailaddress:domagojvrgoc@gmail.com(D. Vrgoˇc).

http://dx.doi.org/10.1016/j.jcss.2015.03.005 0022-0000/©2015ElsevierInc.All rights reserved.

(2)

datawords. Recently,Ley et al. showedin[3,8] that theguarded fragment ofMSOdefinesdata wordlanguagesthat are recognizedbynon-deterministicregisterautomata.

Data wordsappearinthearea ofverificationaswell.Inseveralapplications,one wouldliketodeal withconciseand easy-to-understandrepresentationsoflanguagesofdatawords.Forexample,inmodellinginfinite-statesystemswithfinite control[9,12],aconciserepresentationofsystempropertiesismuchpreferredtolongandunintuitivespecificationsgiven by,say,automata.

Theneedforagoodrepresentationmechanismfordatawordlanguagesisparticularlyapparentingraphdatabases[1]– adatamodelthatisincreasinglycommoninapplicationsincludingsocialnetworks,biology,semanticWeb,andRDF.Many propertiesofinterestinsuchdatabasescanbeexpressedbyregularpathqueries[22],i.e.queriesthataskfortheexistence ofapathconformingtoagivenregularexpression[7,2].Typicalqueriesarespecifiedbytheclosureofatomicformulaeof theformx

L y underthe“and”operation and theexistential quantifier

.Intuitively,theatomicformulax

L y asksfor allpairs

(

x

,

y

)

ofnodessuchthatthereisapathfromx to y whoselabelisintheregularlanguageL[7].

Typically,such logicallanguageshavebeenstudiedwithout takingdatavaluesintoaccount.Recently, however,logical languagesthat extendregular conditionsfromwordstodatawords appeared[20]; forsuchlanguageswe needaconcise wayofrepresentingregular languages,whichismostcommonlydone byregularexpressions (asautomatatendtobetoo cumbersometobeusedinaquerylanguage).

ThemostnaturalextensionoftheusualNFAs todatawordsisregisterautomata (RA),firstintroducedbyKaminski and Francezin[14]andstudied,forexample,in[9,24].Theseareinessencefinitestateautomataequippedwithasetofregisters thatallowthemtostoredatavaluesandmakeadecisionabouttheirnextstepbased,notonlyonthecurrentstateandthe letterinthecurrentposition,butalsobycomparingthecurrentdatavaluewiththeonespreviouslystoredinregisters.

Theywereoriginally introduced asa mechanismtoreasonaboutwordsoveran infinitealphabet(thatis,withoutthe finitepart),butthey easilyextendtodescribe datawordlanguages.Notethat avarietyofother automataformalismsfor datawordsexist,forexample,pebbleautomata[23,28],dataautomata[4],andclassautomata[5].Inthispaperweconcen- trateonlanguagesspecifiedbyregisterautomata,sincetheyarethemostnaturalgeneralizationoffinitestateautomatato languagesoverdatawords.

Asmentionedearlier,ifwethinkofaspecificationofadatawordlanguage,registerautomataarenotthemostnatural wayofprovidingthem.Infact,evenovertheusualwords,regularlanguagesareeasiertodescribebyregular expressions thanbyNFAs.Forexample,inXMLandgraphdatabaseapplications,specifyingpathsviaregularexpressionsiscompletely standard.InmanyXMLspecifications(e.g.,XPath),datavaluecomparisonsarefairlylimited:forinstance,onechecksiftwo pathsendwiththesamevalue.Ontheotherhand,ingraphdatabases,oneoftenneedstospecifyapathusingbothlabels anddatavaluesthatoccurinit.Forthosepurposes,weneedalanguagefordescribingregularlanguagesofdatawords,i.e., languagesacceptedbyregisterautomata.In[20]westartedlookingatsuch expressions,butinacontextslightlydifferent fromdatawords.Ourgoalnowistopresentacleanaccountofregularexpressionsfordatawordsthatwould:

1. capture the power ofregister automata overdata words, justasthe usual regular expressions capturethe power of regularlanguages;

2. havegoodalgorithmicproperties,atleastmatchingthoseofregisterautomata;and 3. admitexpressivesubclasseswithverygood(efficient)algorithmicproperties.

Forthis,we define threeclassesofregularexpressions (inthe orderofdecreasingexpressive power):regular expressions withmemory(REM),regularexpressionswithbinding(REWB),andregularexpressionswithequality(REWE).

Intuitively,REMarestandardregularexpressionsextendedwithafinitenumberofvariables,whichcanbeusedtobind andcomparedatavalues.ItturnsoutthatREMhavethesameexpressivepowerasregisterautomata.Notethatanattempt tofindsuchregularexpressionshasbeenmadebyKaminskiandTanin[15],butitfellshortofeventhefirstgoal.Infact, theexpressions of[15] are not veryintuitive, andthey failto capturesome verysimple languageslike,forexample,the language

{ 

a

d



a

d

 |

d

=

d

}

.In our formalism thislanguage will be described by a regular expression

(

a

x

) · (

a

[

x=

])

. This expressionsays:bindx tobethedatavalueseenwhilereadinga,movetothenextposition,andcheckthatthesymbolis a andthat thedatavaluediffersfromtheonein x.The ideaofbindingis,ofcourse,commoninformal languagetheory, butherewe do not binda letteror asubword (as,forexample, inregular expressions withbackreferencing) butrather valuesfroman infinite alphabet.Thisis alsoreminiscentoffreezequantifiers used inconnection withthestudyofdata wordlanguages[9].Itisworthwhile notingthatREM werealsousedin[20] and[16] todefinea classofgraphdatabase queriescalledregularquerieswithmemory(RQM).

However, one may argue that the binding rule in REM may not be intuitive. Consider the following expression: a

x

(

a

[

x=

]

a

x

)

a

[

x=

]

.Herethelastcomparisonx=isnotdonewithavaluestoredinthefirstbinding,asonewouldexpect, but withthe value stored inside the scope of another binding (the one underthe Kleene star). That is, the expression re-bindsvariablex insidethescopeofanotherbinding,andthencrucially,whenthishappens,theoriginal bindingofx is lost.Suchexpressionsreallymimicthebehaviorofregisterautomata,whichmakesthemmoreproceduralthandeclarative.

(Theaboveexpressiondefinesdatawordsoftheform



a

d1



a

d1

 · · · 

a

dn



a

dn



.)

Losingthe originalbindingofa variablewhenreusingitinsideits scopegoescompletelyagainst theusualpracticeof writinglogicalexpressions,programs,etc., thathaveboundvariables.Nevertheless,asweshow inthispaper,thisfeature wasessentialforcapturingregisterautomata.Anaturalquestionthenarisesaboutexpressionsusingproperscopingrules,

(3)

whichwecallregularexpressionswithbinding(REWB).WeshowthattheyarestrictlyweakerthanREM.Thenonemptiness problemforREWBdropsto NP-complete,whilethecomplexityofthemembershipproblemremainsthesameasforREM.

Finally, we introduce and study regular expressions with equality (REWE). Intuitively, REWE are regular expressions extended with operators

(

e

)

= and

(

e

)

=, which denotesthe language of datawords which conform to e,where the first andthe last datavaluesare thesame anddifferent, respectively. We showthat REWE isstrictly weaker thanREWB,but its membershipandnonemptinessproblemsbecome tractable.Similar toREM, theseexpressions were usedin[20,16] to defineaclassofgraphqueriescalledregularquerieswithdatatests(RQD).

We would like to note that this paper combines and extends results from [19] and [18] where they appeared in a preliminaryformandoftenwithoutcompleteproofs.Herewepresentfullproofsofalltheresults.

Organization In Section2we reviewthebasicnotationsandregisterautomata. Werecallafew factsonRA inSection 3.

Forthe sakeofcompleteness, wereprove someofthem. InSections4,5and6we introduceandstudyREM,REWBand REWE,respectively.FinallyweendwithsomeconcludingremarksinSection7.

2. Notationsanddefinitions

Werecallthedefinitionofdatawordsandformallydefinethestandarddecisionproblemsandclosureproperties.

Datawords Adatawordover



isafinitesequenceof

 × D

,where



isafinitesetoflettersand

D

aninfinitesetof datavalues.Thatis,indatawordeachpositioncarriesaletterfrom



andadatavaluefrom

D

.Wewillwritedatawords as



a1

d1

 . . . 

an

dn



,wherethe labelandthe datavalueinposition i areai anddi,respectively. Theset ofalldatawords over thealphabet



andthesetofdatavalues

D

isdenotedby

( × D)

.Adatawordlanguageisasubset L

⊆ ( × D)

.We denotebyProj

(

w

)

theworda1

· · ·

an,thatis,theprojectionofw toits



component.

Registerautomata Register automata are an analogue of NFAs for data words.They move from one state to another by readingtheletterfromthecurrentpositionandcomparingthedatavaluetotheonespreviouslystoredintheregisters.In thispaperweuseaslightlymodifiedversionofRA.Thedifferencesbetweenthisversionandtheotheronespresentinthe literature[14,9,26]willbediscussedinSection3.

To test (in)equalities of datavalues we use the notion ofconditions which are boolean combinationsof atomic

=, =

comparisons. Formally, conditions are defined asfollows. Let x1

, . . . ,

xk be variables, denoting the registers. The class of conditions

C

kisdefinedbythegrammar:

ϕ := tt | ff |

x=i

|

x=i

| ϕϕ | ϕϕ

wherexi

∈ {

x1

, . . . ,

xk

}

.Intuitively,theconditionx=i (x=i)meansthatthecurrentlyreaddatavalueisthesame(different)as thecontentofregisterxi.

Avaluationonthevariablesx1

, . . . ,

xkisapartialfunction

ν : {

x1

, . . . ,

xk

} → D

.Wedenoteby

F(

x1

, . . . ,

xk

)

thesetofall valuationsonx1

, . . . ,

xk.Foravaluation

ν

,wewrite

ν [

xi

d

]

todenotethevaluation

ν

obtainedbyfixing

ν



(

xi

) =

d and

ν



(

x

) = ν (

x

)

forallother x

=

xi.Avaluation

ν

iscompatiblewitha condition

ϕC

k,ifforevery variablexi that appears in

ϕ

,

ν (

xi

)

isdefined.

Let

νF(

x1

, . . . ,

xk

)

andd

D

.Thesatisfactionofacondition

ϕ

by

(

d

, ν )

isdefinedinductivelyasfollows.

d

, ν | tt

andd

, ν | ff

.

d

, ν |

x=i ifandonlyif

ν (

xi

)

isdefinedand

ν (

xi

) =

d.

d

, ν |

x=i ifandonlyif

ν (

xi

)

isdefinedand

ν (

xi

) =

d.

d

, ν | ϕ

1

ϕ

2 ifandonlyifd

, ν | ϕ

1 andd

, ν | ϕ

2.

d

, ν | ϕ

1

ϕ

2 ifandonlyifd

, ν | ϕ

1 ord

, ν | ϕ

2.

Definition2.1(Registerdatawordautomata). Let



be afinitealphabetandk anaturalnumber.Aregisterautomaton(RA) withk registersx1

, . . . ,

xkisatuple

A = (

Q

,

q0

,

F

,

T

)

,where:

Q isafinitesetofstates;

q0

Q istheinitialstate;

F

Q isthesetoffinalstates;

T isafinitesetoftransitionsoftheform:

q

,

a

[ ϕ ]↓

xi

q or q

,

a

[ ϕ ] →

q

,

whereq

,

qarestates,a

∈ 

,xi

∈ {

x1

, . . . ,

xk

}

,and

ϕ

isaconditionin

C

k.

(4)

Intuitivelyonreadingtheinput



a

d



,iftheautomatonisinstateq andthereisatransitionq

,

a

[ ϕ ]

xi

q

T suchthat d

, ν | ϕ

,where

ν

indicatesthecurrentcontentoftheregisters,thenitmovestothestate q whilechangingthecontent ofregister i tod.The transitionsq

,

a

[ ϕ ]

q are processedsimilarly, exceptthat they donot changethe contentofthe registers.

ThetransitionsinRAcanbegraphicallyrepresentedasfollows:

q a

[ ϕ ]↓

xi q or q a

[ ϕ ]

q

Let

A

beak-registerautomaton.Aconfiguration of

A

on w isapair

(

q

, ν )

Q

× F(

x1

, . . . ,

xk

)

.Theinitialconfiguration is

(

q0

, ν

0

)

,where

ν

0

= ∅

.Aconfiguration

(

q

, ν )

withq

F isafinalconfiguration.

Aconfiguration

(

q

, ν )

yieldsaconfiguration

(

q

, ν



)

by



a

d



,denotedby

(

q

, ν ) 

a,d

(

q

, ν



)

,ifeither

thereisatransitionq

,

a

[ ϕ ] ↓

xi

qof

A

suchthatd

, ν | ϕ

and

ν



= ν [

xi

d

]

,or

thereisatransitionq

,

a

[ ϕ ]

qof

A

suchthatd

, ν | ϕ

and

ν



= ν

. Let w

= 

a1

d1

 . . . 

an

dn



. A run of

A

on w is a sequence of configurations

(

q0

, ν

0

), . . . , (

qn

, ν

n

)

such that

(

q0

, ν

0

)

is the initial configuration and

(

qi1

, ν

i1

) 

ai,di

(

qi

, ν

i

)

,for each i

=

1

, . . . ,

n. It is called an accepting run if

(

qn

, ν

n

)

is a final configuration.Wesaythat

A

acceptsw,denotedbyw

L

( A)

,ifthereisanacceptingrunof

A

onw.

Fora valuation

ν

, wedefine theautomaton

A( ν )

that behavesjustlike theautomaton

A

,butweinsist that anyrun startswiththeconfiguration

(

q0

, ν )

,thatis,withthecontentoftheregisters

ν

,insteadoftheemptyvaluation.

Wenowpresentafewexamplesofregisterautomata.

Example2.2.Inthefollowinglet

 = {

a

}

.Toeasethenotationintransitionsthathave

ϕ = tt

wewillomit

ϕ

andsimply writea

xi,ora onthearrowsofourautomata.

The1-registerautomaton

A

1 representedbelowacceptsall datawordsinwhichthedatavalueinthefirstpositionis differentfromalltheotherdatavalue.

start q a

x q

a

[

x=

]

The1-registerautomaton

A

2 representedbelowacceptsalldatawordsinwhichtherearetwopositionswiththesame datavalue.

start q a

x q q

a a a

a

[

x=

]

3. Someusefulfactsaboutregisterautomata

Inthis sectionwe recall some basiclanguage theoretic propertiesof RA, whichwill beusefullater on. Mostofthem already followfrom [14]. However, we would like to note first that there are two versions of RA present in the litera- ture.Althoughthey areequivalentinexpressivepower,their differencesubtlyeffectsthecomplexityofdecisionproblems discussedinthispaper.

Wewillbrieflydiscussbothversions,andshowhowtheyrelatetothemodelpresentedinthispaper

Inthefirstversiondifferentregistersalwayscontaindifferentdatavalues,asdefinedin[14].Inshort,transitionsinthis versionare oftheform

(

q

,

i

,

q

)

,whichintuitivelymeansthat theautomatoncanmovefromstate q toq,ifthe data valuecurrentlyreadisthesameastheoneinregisteri.

Inthesecondversiondifferentregistersmaycontainthesamedatavalue,asdefinedin[14,9,26].Inshort,transitions inthisversion areoftheform

(

q

,

I

,

q

)

,where I isa setofregisters,whichintuitively meansthattheautomatoncan movefromstateq toq,ifthedatavaluecurrentlyreadcanbefoundinregistersinI.

For more details, we refer the reader to [14,9,26]. Let’s call the former and the latter versions the S-version and the M-version,respectively. S standsforsingleregisterand M formultipleregisters.Itwas shownin[14] thatboth versions havethesameexpressivepower.

(5)

Notethattransitionsin S- and M-versionscanbe writtenintermsofconditions.Fortransitionsin S-version,wehave conditionsoftheform

x=i

∧ 

j=i

x=j

,

FortransitionsinM-version,wehaveconditionsoftheform



hI

x=h

∧ 

h/I x=h

.

Conversely, conditions can be easily translated to transitions in M-version. A condition is simply a class of subsets of registers that contain the currentlyread datavalue. Hence, all the models: the S-version, the M-version andthe model presentedinthispaperareequivalentinexpressivepower.

The readermayaskthough whywechoose themodelthat allowsrepetition ofdatavalues.Thereare severalreasons forthis. Firstandforemost,RAdefinedinthispaperprovidegreater flexibilityinpresentingourideasthan thosestudied so far, andthus, make the presentationless technical. Second, this model hasby now became moremainstream when discussing registerautomata [9,26].Finally, theoriginal motivationfor studyingexpressions overdata wordscomes from theareaofgraphdatabases[20],whereitisnaturaltoassumethatsomedatavaluecanberepeatedandreusedafterwards fordifferentpurposes.

We willnow listsomegeneralpropertiesofregisterautomata, whichwillbe usefullateron.We startwitha folklore whichstatesthatRAcanonlykeeptrackofasmanydatavaluesascanbestoredintheirregisters.Formally,wehave:

Lemma3.1.(See[14,Proposition 4].) Let

A

beak-registerdatawordautomaton.If

A

acceptssomewordoflengthn,thenitaccepts awordoflengthn inwhichthereareatmostk

+

1 differentdatavalues.

Theproofofthelemma in[14]assumesthatineachassignmentallthedatavaluesaredifferent.Itisstraightforwardto seethattheproofalsoworkswhenweallowrepeatedoccurrenceofdatavalues.

Next,notethatwhenrestrictedonlytoafinitesetD ofdatavalues,registerautomatacanbeviewedasordinaryNFAs.

Toseethisobservethatthereareonly

|

D

| +

1 numberofpossiblecontentsofeachregister:eithertheregisterisemptyor itcontainsadatavaluefromD.Hence,allpossiblecontentsoftheregisterscanbesimplyencodedinsidethestatesofthe NFA.Weformallystatethisinthelemmabelow.

Lemma3.2.(See[14,Proposition 1].) LetD beafinitesetofdatavaluesand

A

ak-registerRAover



.Thenthereexistsafinitestate automaton

A

Doverthealphabet

 ×

D suchthatw

L

(A

D

)

ifandonlyifw

L

(A)

,foreveryw withdatavaluesfromD.Moreover, thenumberofstatesin

A

Dis

|

Q

| · (|

D

| +

1

)

k,whereQ isthesetofstatesin

A

.

Membershipandnonemptinessproblemsaresomeofthemostimportantdecisionproblemsrelatedtoformallanguages.

Tobeprecise,wepresentthedefinitionsoftheseproblemshere.Thenonemptinessproblem asksustocheck,givenasinput an RA

A

,whether L

( A) = ∅

.The membershipproblem takesasinput an RA

A

andadataword w,andrequirestocheck whether w

L

(A)

.

Wenowrecalltheexactcomplexityoftheseproblemsforregisterautomata.

Fact3.3.

ThenonemptinessproblemforRAis PSpace-complete[9].

ThemembershipproblemforRAis NP-complete[24].

The hardnessin[9]isforthe M-version,whiletheonein[24]isforthe S-version.However, notethattranslating the transitionsin S- and M-versions toconditionsincursonlylinearblow-up(for S-version)andnoblow-up (for M-version).

Hence, the NP-hardness andthe PSpace-hardnessfor decisionproblemsfor S- and M-versions will alsohold fortheRA definedinthispaper.TheupperboundfollowsfromLemmas 3.1 and3.2andFact 3.4.

Sinceregisterautomatacloselyresembleclassicalfinitestateautomata,itisnotsurprisingthatsome(althoughnotall) constructions validfor NFAs can be carried overto register automata. We nowrecall results aboutclosure propertiesof registerautomatamentionedin[14],ofwhichtheproofscanbeeasilymodifiedtosuittheRAmodelproposedhere.

Fact3.4.(See[14].)

1. Thesetoflanguagesacceptedbyregisterautomataisclosedunderunion,intersection,concatenationandKleenestar.

2. Languagesacceptedbyregisterautomataarenotclosedundercomplement.

3. Languagesacceptedbyregisterautomataareclosedunderautomorphismson

D

:thatis,if f

: D → D

isabijectionand w isacceptedby

A

,thenthedataword f

(

w

)

inwhicheverydatavalued isreplacedby f

(

d

)

isalsoacceptedby

A

.

(6)

4. Regularexpressionswithmemory

Inthissection we define ourfirst classofregular expressions fordata words,calledregular expressionwithmemory (REM), andwe show that theyare equivalent to RA interms ofexpressive power. The ideabehind themfollows closely theequivalencebetweenthestandardregularexpressionandfinitestateautomata.NoticethatRAcanbepicturedasfinite state automatawhosetransitionsbetweenstateshavelabelsoftheforma

[ ϕ ]

x ora

[ ϕ ]

.Likewise, thebuildingblocksfor REMareexpressionsoftheforma

[ ϕ ] ↓

x anda

[ ϕ ]

.

Definition4.1 (Regularexpressionswithmemory(REM)). Let



be afinite alphabetandx1

, . . . ,

xk be variables. Regularex- pressionswithmemory(REM)over

[

x1

, . . . ,

xk

]

aredefinedinductivelyasfollows:

ε

and

areREM;

a

[ ϕ ] ↓

xianda

[ ϕ ]

areREM,wherea

∈ 

,

ϕ

isaconditionin

C

k,andxi

∈ {

x1

, . . . ,

xk

}

;

Ife

,

e1

,

e2 areREM,thensoaree1

+

e2,e1

·

e2,ande.

Forconvenience,when

ϕ = tt

,wewillwritea anda

x,insteadofa

[tt]

anda

[tt] ↓

x.

Semantics TodefinethelanguageexpressedbyanREMe,weneedthefollowingnotation.Lete beanREMover

[

x1

, . . .

xk

]

and

ν , ν



F(

x1

, . . . ,

xk

)

.Letw beadataword.Wedefinearelation

(

e

,

w

, ν )  ν

inductivelyasfollows.

• ( ε ,

w

, ν )  ν

ifandonlyifw

= ε

and

ν



= ν

.

• (

a

[ ϕ ]

xi

,

w

, ν )  ν

ifandonlyifw

= 

a

d



and

ν ,

d

| ϕ

and

ν



= ν [

xi

d

]

.

• (

a

[ ϕ ],

w

, ν )  ν

ifandonlyifw

= 

a

d



and

ν ,

d

| ϕ

and

ν



= ν

.

• (

e1

·

e2

,

w

, ν )  ν

 if and only if there exist w1

,

w2 and

ν

 such that w

=

w1

·

w2 and

(

e1

,

w1

, ν )  ν

 and

(

e2

,

w2

, ν



)  ν

.

• (

e1

+

e2

,

w

, ν )  ν

ifandonlyif

(

e1

,

w

, ν )  ν

or

(

e2

,

w

, ν )  ν

.

• (

e

,

w

, ν )  ν

ifandonlyif 1. w

= ε

and

ν = ν

,or

2. thereexist w1

,

w2and

ν

 suchthat w

=

w1

·

w2and

(

e

,

w1

, ν )  ν

 and

(

e

,

w2

, ν



)  ν

.

Wesaythat

(

e

,

w

, ν )

infers

ν

,if

(

e

,

w

, ν )  ν

.If

(

e

,

w

, ∅)  ν

,then wesaythat e induces

ν

ondataword w.Wedefine L

(

e

)

asfollows.

L

(

e

) = {

w

|

e induces

ν

on w for some

ν }

Example4.2.ThefollowingtwoREMse1ande2:

e1

= (

a

x

) · (

a

[

x=

])

e2

=

a

· (

a

x

) ·

a

· (

a

[

x=

]) ·

a

capturespreciselythelanguagesL

(A

1

)

andL

(A

2

)

inExample 2.2,respectively.

Example4.3.Let

 = {

a

,

b1

,

b2

, . . . ,

bl

}

.ConsiderthefollowingREMe over

 [

x

,

y

]

:

e

= 

· (

a

x

) · 

· (

a

y

) · 

· ([

x=

] + [

y=

]) · ([

x=

] + [

y=

])

where

[

x=

]

standsfor

(

a

[

x=

] +

b1

[

x=

] + · · · +

bl

[

x=

])

.Thelanguage L

(

e

)

consistsofdatawordsinwhichthelasttwodata valuesoccurelsewhereinthewordwithlabela.

ThefollowingtheoremstatesthatREMandRAareequivalentinexpressivepower.

Theorem4.4.REMandRAhavethesameexpressivepowerinthefollowingsense.

ForeveryREMe over

 [

x1

, . . . ,

xk

]

,thereexistsak-registerRA

A

e suchthatL

(

e

) =

L

( A

e

)

.Moreover,theRA

A

ecanbecon- structedinpolynomialtime.

Foreveryk-registerRA

A

,thereexistsanREMeAover

[

x1

, . . . ,

xk

]

suchthatL

(

eA

) =

L

(A)

.Moreover,theREMeAcanbe constructedinexponentialtime.

Proof. Inwhatfollowswewillneedthefollowingnotation. Foran REMe over

[

x1

, . . . ,

xk

]

and

ν , ν



F(

x1

, . . . ,

xk

)

,let L

(

e

, ν , ν



)

bethesetofalldatawords w suchthat

(

e

,

w

, ν )  ν

.ForanRA

A

withk registers,letL

( A, ν , ν



)

bethesetof alldatawords w suchthatthereisanacceptingrun

(

q0

, ν

0

), . . . , (

qn

, ν

n

)

of

A

onw and

ν

0

= ν

and

ν

n

= ν

.

WearegoingtoprovethefollowinglemmawhichimmediatelyimpliesTheorem 4.4.

(7)

Lemma4.5.

1. For every REMe over

 [

x1

, . . . ,

xk

]

, thereexists a k-registerRA

A

e suchthat L

(

e

, ν , ν



) =

L

( A

e

, ν , ν



)

forevery

ν , ν



F(

x1

, . . . ,

xk

)

.Moreover,theRA

A

ecanbeconstructedinpolynomialtime.

2. Foreveryk-registerRA

A

,thereexistsanREMeAover

[

x1

, . . . ,

xk

]

suchthatL

(

eA

, ν , ν



) =

L

(A, ν , ν



)

forevery

ν , ν



F(

x1

, . . . ,

xk

)

.Moreover,theREMeAcanbeconstructedinexponentialtime.

The restofthissubsectionisdevotedtotheproof ofLemma 4.5.Thestructureoftheproof followsthestandard NFA- regular expressions equivalence, cf. [27], withall the necessaryadjustments to handletransitions induced by a

[ ϕ ]

x or a

[ ϕ ]

.

Weprovethefirstitembyinductiononthestructureofe.

If e

= ∅

, then

A

e

= (

Q

,

q0

,

F

,

T

)

, where Q

= {

q0

}

isthe setof states,q0 isthe initialstate, F

= ∅

isthe setoffinal statesandT

= ∅

.

If e

= ε

,then

A

e

= (

Q

,

q0

,

F

,

T

)

,where Q

= {

q0

}

istheset ofstates,q0 is theinitial state, F

= {

q0

}

the setoffinal statesandT

= ∅

.

Ife

=

a

[ ϕ ] ↓

xi,then

A

e

= (

Q

,

q0

,

F

, ,

T

)

,where Q

= {

q0

,

q1

}

isthesetofstates,q0 istheinitialstate, F

= {

q1

}

theset offinalstatesandT

= {

q0

,

a

[ ϕ ]

xi

q1

}

.

Ife

=

a

[ ϕ ]

,then

A

e

= (

Q

,

q0

,

F

, ,

T

)

,where Q

= {

q0

,

q1

}

isthesetofstates,q0 istheinitialstate, F

= {

q1

}

thesetof finalstatesandT

= {

q0

,

a

[ ϕ ]

q1

}

.

Thecasewhene

=

e1

+

e2,ore

=

e1

·

e2,ore

=

e1 followfromtheinductivehypothesisandthefactthatRAlanguages areclosedunderunion,concatenationandKleenestar(Fact 3.4).

Inallcasesitisstraightforwardtocheckthattheconstructedautomatonhasthedesiredproperty.Thepolynomialtime boundfollowsimmediatelyfromtheconstruction.

Next we move onto the second claim of the theorem. To prove this, we will have to introduce generalized reg- ister automata (GRA for short) over data words. The difference from usual register automata will be that we allow arrows to be labelled by arbitrary regular expressions with memory, i.e., our arrows are now not labelled only by la- bels a

[ ϕ ]

xi,ora

[ ϕ ]

,butalsoby anyregular expressionwithmemory. Thetransitionrelation iscalled

δ

anddefinedas

δ

Q

×

REM

( [

x1

, . . . ,

xk

]) ×

Q ,where REM

( [

x1

, . . . ,

xk

])

denotes the setofall regular expressions withmemory over

[

x1

, . . . ,

xk

]

.Inadditiontothat, wealsospecifythat wehaveasingleinitialstate withnoincomingarrowsandasingle finalstatewithnooutgoingarrows.Notethatwealsoallow

ε

-transitions.Theonlydifferenceishowwedefineacceptance.

A GRA

A

acceptsdataword w,if w

=

w1

·

w2

· . . . ·

wk (whereeach wi is adataword) andthereexists a sequence c0

= (

q0

, τ

0

), . . . ,

ck

= (

qk

, τ

k

)

ofconfigurationsof

A

on w suchthat:

1. q0 istheinitialstate, 2. qkisafinalstate,

3. for each i we have

(

ei

,

wi

, τ

i

)  τ

i+1 (i.e. wi

L

(

ei

, τ

i

, τ

i+1

)

), for some ei such that

(

qi

,

ei

,

qi+1

)

is inthe transition relationfor

A

.

Wecannowprovetheequivalenceofregisterautomataandregularexpressionswithmemorybymimickingtheconstruc- tionusedtoproveequivalencebetweenordinaryfinitestateautomataandregular expressions(overstrings).Sinceweuse thesameconstructionwewillgetanexponentialblow-up,justlikeforfinitestateautomata.

Asinthefinitestatecase,wefirstconvert

A

intoaGRAbyaddinganewinitialstate(connectedtotheoldinitialstate by an

ε

-arrow)anda newfinal state (connectedto theold endstates byincoming

ε

-arrows).We alsoassume that this automaton hasonly a single arrowbetweenevery two states(we achieve thisby replacingmultiple arrowsby union of expressions).ItisclearthatthisGRArecognizesthesamelanguageofdatawordsas

A

.

Next weshow how toconvertthisautomatoninto an equivalent REM.We willusethe followingrecursive procedure which ripsout onestate atatime fromthe automatonandstopswhenweendwithonlytwo states.Notethatthisisa standard procedureusedtoshow theequivalencebetweenNFAsandregular expressions(see [27]). Forcompleteness,we repeatithereandshowhowitcanalsobeusedwhendatavaluesaretakenintoaccount.

1. CONVERT(

A

)

2. Letn bethenumberofstatesof

A

.

3. Ifn

=

2,then

A

contains onlyastartstate andan endstate witha singlearrowconnectingthem.Thisarrowhasan expression R writtenonit.ReturnR.

4. Ifn

>

2,selectanystateqrip,differentfromqstart andqend andmodify

A

inthefollowingmannerto obtain

A

 with one lessstate. The newset ofstates is Q

=

Q

− {

qrip

}

andfor anyqi

Q

− {

qaccept

}

andany qj

Q

− {

qstart

}

we define

δ



(

qi

,

qj

) = (

R1

)(

R2

)

(

R3

) +

R4, where R1

= δ(

qi

,

qrip

),

R2

= δ(

qrip

,

qrip

),

R3

= δ(

qrip

,

qj

)

and R4

= δ(

qi

,

qj

)

. The initialandfinalstateremainthesame.

5. ReturnCONVERT(

A

).

(8)

Wenow provethatforanyGRA

A

,theexpression CONVERT(

A

)andGRA

A

recognize thesamelanguageofdatawords.

We dosoby inductiononthenumbern of statesofourGRA

A

.Ifn

=

2 then

A

hasonly asingle arrowfrominitialto finalstateandbydefinitionofacceptanceforGRA,theexpressiononthisarrowrecognizesthesamelanguageas

A

.

Assume now that the claim is truefor all automatons withn

1 states. Let

A

be an automaton withn states. We provethat

A

isequivalenttoautomaton

A

obtainedinthestep3ofourCONVERTalgorithm.Notethatthiscompletesthe induction.

To see this assume first that w

L

( A, σ , σ



)

, i.e.

A

with initial assignment

σ

has an accepting run on w end- ing with

σ

 in the registers. This means that there exists a sequence of configurations c0

= (

q0

, τ

0

), . . . ,

ck

= (

qk

, τ

k

)

such that w

=

w1w2

. . .

wk,where each wi is a data word(with possibly more than one symbol),

τ

0

= σ , τ

k

= σ

 and

(δ(

qi1

,

qi

),

wi

, τ

i1

)  τ

i,fori

=

1

, . . . ,

k.(Hereweusetheassumptionthatweonlyhaveasinglearrowbetweenanytwo states).

Ifnoneofthestatesinthisrunareqrip,thenitisalsoan acceptingrunin

A

,so w

L

(A



, σ , σ



)

,sinceallthearrows presentherearealsoin

A

.

Ifqripdoesappear,wehavethefollowinginourrun:

ci

= (

qi

, τ

i

),

crip

= (

qrip

, τ

i+1

), . . . ,

crip

= (

qrip

, τ

j1

),

cj

= (

qj

, τ

j

).

Ifweshowhowtounfoldthistoarunin

A

,wearedone(ifthisappearsmorethanonceweapplythesameprocedure).

Since this is the case, we know (by the definition of accepting run) that

(

R1

,

wi+1

, τ

i

)  τ

i+1

, (

R2

,

wi+2

, τ

i+1

)  τ

i+2

, (

R2

,

wi+3

, τ

i+2

)  τ

i+3

, . . . , (

R2

,

wj1

, τ

j2

)  τ

j1 and

(

R3

,

wj

, τ

j1

)  τ

j, where R1

= δ(

qi

,

qrip

),

R2

= δ(

qrip

,

qrip

),

R3

= δ(

qrip

,

qj

)

.Note that thissimplymeans

((

R1

)(

R2

)

(

R3

),

wiwi+1

. . .

wj

, σ )  σ

,so

A

 can jumpfrom ci to cj using onlyonetransition.

Conversely,supposethatw

L

(A



, σ , σ



)

.Thismeansthatthereisacomputationof

A

,startingwiththeinitialcontent ofregisters

σ

andendingwith

σ

.Weknowthateach arrowin

A

 fromqi toqj goeseitherdirectly(in whichcaseitis alreadyin

A

)orthroughqrip(inwhichcaseweusethedefinitionofacceptancebyregularexpressionstounravelthisword intopartrecognizedby

A

).Ineithercasewegetanacceptingrunof

A

on w.

Toseethatthisgivesthedesiredresult,observethatwecanalwaysconvertregisterautomatonintoanequivalent GRA anduseCONVERTtoobtainaregularexpressionwithmemoryrecognizingthesamelanguage.Thiscompletesourproofof Theorem 4.4.

2

Applying Theorem 4.4 and Fact 3.4, we immediately obtain that languages defined by REM are closed under union, intersection,concatenationandKleenestar,butnot undercomplement.

The next theorem states that the nonemptiness and the membership problems for REM are PSpace-complete and NP-complete,respectively.Note that thehardness resultsdo not followfromTheorems 3.3 and 4.4,since theconversion fromRAtoREMinTheorem 4.4takesexponentialtime.

Theorem4.6.

ThenonemptinessproblemforREMis PSpace-complete.

ThemembershipproblemforREMis NP-complete.

Proof. Webeginbyproving theboundfornonemptiness.ByTheorem 4.4,we canconvertREMtoRAinpolynomialtime.

ByTheorem 3.3,thenonemptinessproblemforRAis PSpace.Hence,the PSpace upperboundfollows.

Nextwearegoingtoestablishthe PSpace-hardness,whichisestablishedviaareductionfromthenonuniversalityprob- lemoffinitestate automata.Thenonuniversalityproblemasks, givena finitestate automaton

A

asinput,decidewhether L

(A) = 

.

Assumewearegivenaregularautomaton

A = (

Q

, , δ,

q1

,

F

)

,whereQ

= {

q1

, . . . ,

qn

}

andF

= {

qi1

, . . . ,

qik

}

.

Sincewearetryingtodemonstratenonuniversalityoftheautomaton

A

,wesimulatereachabilitycheckinginthepow- ersetautomatonfor

A

.

Thatis,wearetryingtosimulateasequenceS0

,

S1

, . . . ,

Sm ofsubsetsof Q ,where:

(P1) S0

= {

q1

}

andSm

F

= ∅

;

(P2) foreachi

=

1

, . . . ,

m,thereisb

∈ 

suchthat Si

= {

q

| ∃

p

Si1 such that q

∈ δ(

p

,

b

)}

.

Todo sowe designatetwo distinct datavalues,t and f ,andencode each subset S

Q asan n-bit sequenceoft

/

f values, wherethe ith bit of the sequence is setto t, if the state qi is includedin the subset S.Since we are checking reachability,wewillneedonlytorememberthecurrentsetSjandthenextset Sj+1.Inwhatfollowswewillencodethose twostatesusingvariables s1

, . . . ,

sn andt1

, . . . ,

tn andrefertothemasthecurrentstate tapeandthenextstatetape.Our expressione willencodedatawordsthatdescribeasequenceS0

, . . . ,

Smthatsatisfies(P1)and(P2)abovebydemonstrating howonecanmovefromoneset Sj tothenextset Sj+1 (aswitnessedbytheir codesincurrentstatetapeandnextstate tape),startingwiththeinitialsetS0

= {

q1

}

andendinginsetSm,whereSm

F

= ∅

.

參考文獻

相關文件

In this section we define a general model that will encompass both register and variable automata and study its query evaluation problem over graphs. The model is essentially a

• The memory storage unit holds instructions and data for a running program.. • A bus is a group of wires that transfer data from one part to another (data,

For obvious reasons, the model we leverage here is the best model we have for first posts spam detection, that is, SVM with RBF kernel trained with dimension-reduced

sketch with weak labels first, refine with limited labeled data later—or maybe learn from many weak labels only?.. Learning with Limited

• Extension risk is due to the slowdown of prepayments when interest rates climb, making the investor earn the security’s lower coupon rate rather than the market’s... Prepayment

• As all the principal cash flows go to the PAC bond in the early years, the principal payments on the support bond are deferred and the support bond extends... PAC

Like regular full-time teachers, regular part-time teachers within the approved teaching establishment are subject to the provisions under the Code of Aid for Aided Schools,

We will give a quasi-spectral characterization of a connected bipartite weighted 2-punctually distance-regular graph whose halved graphs are distance-regular.. In the case the