• 沒有找到結果。

Imperfect Information

N/A
N/A
Protected

Academic year: 2022

Share "Imperfect Information"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

Imperfect Information



Martin De Wulf, Laurent Doyen, and Jean-Franc¸ois Raskin D´epartement d’Informatique,

Universit´e Libre de Bruxelles

Abstract. In this paper, we propose a fixed point theory to solve games of imper- fect information. The fixed point theory is defined on the lattice of antichains of sets of states. Contrary to the classical solution proposed by Reif [Rei84], our new solution does not involve determinization. As a consequence, it is readily applica- ble to classes of systems that do not admit determinization. Notable examples of such systems are timed and hybrid automata. As an application, we show that the discrete control problem for games of imperfect information defined by rectan- gular automata is decidable. This result extends a result by Henzinger and Kopke in [HK99].

1 Introduction

Timed and hybrid systems are dynamical systems with both discrete and continuous components. A paradigmatic example of a hybrid system is a digital control program for an analog plant environment, like a furnace or an airplane: the controller state moves discretely between control modes, and in each control mode, the plant state evolves continuously according to physical laws. A natural model for hybrid systems is the hybrid automaton, which represents discrete components using finite-state machines and continuous components using real-numbered variables whose evolution is governed by differential equations or differential inclusions [ACH+95].

The distinction between continuous evolutions of the plant state (which is given by the real-numbered variables of a hybrid automaton) and discrete switches of the con- troller state (which is given by the location, or control mode, of the hybrid automaton) permits a natural formulation of the safety control problem: given an unsafe set U of plant states, is there a strategy to switch the controller state in real time so that the plant can be prevented from entering U ? In other words, the hybrid automaton specifies a set of possible control modes, together with the plant behavior resulting from each mode, and the control problem asks for deriving a switching strategy between control modes that keeps the plant out of trouble.

In the literature, there are algorithms or semi-algorithms (termination is not al- ways guaranteed) to derive such switching strategy. Those semi-algorithms usually comes in the form of symbolic fixed point computations that manipulate sets of

Supported by the FRFC project “Centre F´ed´er´e en V´erification” funded by the Belgian National Science Fundation (FNRS) under grant nr 2.4530.02.

Research fellow supported by the Belgian National Science Foundation (FNRS).

J Hespanha and A. Tiwari (Eds.): HSCC 2006, LNCS 3927, pp. 153–168, 2006.

 Springer-Verlag Berlin Heidelberg 2006c.

(2)

states using a well-suited monotonic function like the controllable predecessor oper- ator [AHK02, MPS95]. Those algorithms make a strong hypothesis: they consider that the controller that executes the switching strategy has a perfect information about the state of the controlled system. Unfortunately, this is usually an unreasonable hypothe- sis. Indeed, when the switching strategy has to be implemented by a real hardware, the controller typically acquires information about the state of the system by reading values on sensors. Those sensors have finite precision, and so the information about the state in which the system lies is imperfect. Let us illustrate this. Consider a controller that monitors the temperature of a tank, and has to maintain the temperature between given bounds by switching on and off a gas burner. The temperature of the tank is the state of the continuous system to control. Assume that the temperature is sensed through a ther- mometer that returns an integer number and ensures a deviation bounded by one degree Celsius. So, when the sensor returns the temperature c, the controller only knows that the temperature lies in the interval (c− 1, c + 1) degrees. We say that the sensor reading is an observation of the system. This observation gives an imperfect information about the state of the system.

Now, if we fix a set of possible observations of the system to control, the control problem that we want to solve is the safety control problem with imperfect information:

“given an unsafe set U of plant states, a set of observations, is there an observation based strategy to switch the controller state in real time so that the plant can be prevented from entering U ?”. While it is well-known that safety games of perfect information can be won using memoryless strategies, it is not the case for games of imperfect informa- tion [Rei84]. In that paper, Reif studies games of incomplete information which are a subclass of safety games of imperfect information where the set of observations is a partition of the state space. Notice that this is not the case of our tank example since when the temperature of the water is d, the thermometer may return eitherd or d. To win such games, memory is sometimes necessary: the controller has to remember (part of) the history of observations that it has made so far. In the finite state case, games of incomplete information can be solved algorithmically. Reif proposes an algorithm that first transforms the game of incomplete information into a game of perfect information using a kind of determinization procedure.

In this paper, we propose an alternative method to solve games of imperfect (and in- complete) information. Our method comes in the form of a fixed point (semi-)algorithm that iterates a monotone operator on the lattice of antichains of sets of states. The great- est fixed point of this operator contains exactly the information needed to determine the states from which an observation based control strategy exists and to synthesize such a strategy. We prove that our algorithm has an optimal complexity for finite state games and we identify a class of infinite state games for which the greatest fixed point of the operator is computable. Using this class of games and results from [HK99], we show that the discrete-time control problem with imperfect information is decidable for the class of rectangular automata. Strategies that win those games are robust as they can be implemented using hardware that senses its environment with finite precision.

Our fixed point method has several advantages over the algorithmic method proposed by Reif. First, as it does not require determinization, our (semi-)algorithm is readily ap- plicable to classes of systems for which determinization is not effective: timed and

(3)

hybrid automata are notable examples [AD94]. Second, we show that there are families of games on which the Reif’s algorithm needs exponential time when our algorithm only needs polynomial time. Third, as our method is based on a lattice theory, abstract interpretation methods can be used to derive in a systematic way approximation algo- rithms [CC77].

Our paper is structured as follows. In Section 2, we recall the definition of the lattice of antichains. In Section 3, we show how to use this lattice to solve games of imperfect information. In Section 4, we give a fixed point algorithm that is EXPTIME for finite state games and we compare with the technique of Reif. Finally, in Section 5, we solve games of imperfect information for rectangular automata. Due to lack of space, the proofs of most of the theorems have been omitted and can be found in [DDR06].

2 The Lattice of Antichains of Sets of States

First we recall the notion of antichain. An antichain on a partially ordered setX, ≤

is a set X ⊆ X such that for any x1, x2∈ Xwith x1= x2we have neither x1≤ x2

nor x2 ≤ x1, that is X is a set of incomparable elements of X. We define similarly a chain to be a set of comparable elements of X.

Let q, q ∈ 22S and define q q if and only if ∀s ∈ q : ∃s ∈ q : s ⊆ s. This relation is a preorder but is not antisymmetric. Since we need a partial order, we construct the set L ⊆ 22S for which is antisymmetric on L. The set L is the set of antichains on2S,⊆.

We say that a set s⊆ S is dominated in q if and only if ∃s∈ q : s ⊂ s. The set of dominated elements of q is denoted Dom(q). The reduced form of q isq = q\Dom(q) and dually the expanded form of q is q = q ∪ Dom(q). The set q is an antichain of

2S,⊆. Observe that Dom(q) = ∅, that is ∀s, s ∈ q : if s1 ⊆ s2then s1 = s2. The relation has the useful following properties:

Lemma 1. Let q, q∈ 22S. If q⊆ qthen q q. Lemma 2. ∀q, q ∈ 22S,∀q1, q2

q,q, q

,∀q1, q2 

q,q, q

: q1 q2

is equivalent to q1 q2.

We can now define formally L as the set{q | q ∈ 22S}.

Lemma 3. The relation ⊆ L × L is a partial ordering and L,  is a partially ordered set.

Lemma 4. For q, q∈ L, the greatest lower bound of q and qis q

q={s∩s| s ∈ q∧ s ∈ q} and the least upper bound of q and qis q

q={s | s ∈ q ∨ s ∈ q}.

For Q⊆ L, we have

Q ={

q∈Qsq | sq ∈ q} and

Q ={s | ∃q ∈ Q : s ∈ q}.

The least element of L is⊥ =

L =∅ and the greatest element of L is  = L = {S}.

Lemma 5. L, , ,

,⊥ ,  is a complete lattice.

This lattice is the lattice of antichains of sets of states.

(4)

3 Games of Imperfect Information

3.1 Definitions

Notations. Given a finite sequence a = a0, a1, . . . , an, we denote by|a| = n + 1 the length of a, by ak= a0, . . . , akthe sequence of the first k + 1 elements of a (and a−1 is the empty sequence) and by last(a) = anthe last element of a.

Definition 6 [Two-player games]. A two-player game is a tupleS, S0, Σc, Σu,→

where S is a (non-empty) set of states, S0⊆ S is the set of initial states, Σc(resp. Σu) is a finite alphabet of controllable (resp. uncontrollable) actions, and→⊆ S × (Σc Σu)× S is a transition relation.

The game is turn-based and played by a controller against an environment. To initialize the game, the environment chooses a state x∈ S0and the controller takes the first turn.

A turn of the controller consists of choosing a controllable action σ that is enabled in the current state x. If no such action exists, the controllers loses. A turn of the envi- ronment then consists of determining a state y such that x −→ y and of choosing anσ uncontrollable action u and a state z such that y−→ z. If no enabled action u exists theu environment loses. If the game continues forever, the controller wins.

For σ∈ Σc∪ Σu, let Enabled(σ) ={x ∈ S | ∃x ∈ S : (x, σ, x)∈→} be the set of states in which the action σ is enabled, and for s⊆ S let Postσ(s) ={x∈ S | ∃x ∈ s : (x, σ, x) ∈→} be the set of successor states of s by the action σ. Furthermore, given a set Σ⊆ Σc∪ Σu, we define the notation PostΣ(s)to mean

σ∈ΣPostσ(s).

The controller has an imperfect view of the game state space in that his/her choices are based on imprecise observations of the states.

Definition 7 [Observation set]. An observation set of the state space S is a cou- ple (Obs, γ) where γ : Obs → 2S is such that for all x ∈ S, there exists obs ∈ Obs such that x∈ γ(obs).

An observation obs is compatible with a state x if x ∈ γ(obs). When the controller observes the current state x of the game, he/she receives one observation compatible with x. The observation is non-deterministically chosen by the environment.

Definition 8 [Imperfect information]. A two-player game S, S0, Σc, Σu,→

equipped with an observation set (Obs, γ) of its state space defines a game of imperfect informationS, S0, Σc, Σu,→, Obs, γ. The size of the game is the sum of the sizes of the transition relation→ and the set Obs.

Let G =S, S0, Σc, Σu,→, Obs, γ be a game of imperfect information. We say that Gis a game of incomplete information if for any obs1, obs2∈ Obs, if obs1= obs2then γ(obs1)∩ γ(obs2) =∅, that is the observations are disjoint, thus partitioning the state space. We say that G is a game of perfect information if Obs = S and γ is the identity function.

The drawback of games of incomplete information is that they are not suited for a robust modelization of sensors. Indeed, real sensors are imprecise and may return different observations for a given state.

(5)

An observation based strategy for a game of imperfect information G =

S, S0, Σc, Σu,→, Obs, γ is a function λ : Obs+ → Σc. The outcome of λ on G is the set Outcomeλ(G)of couples (x, obs)∈ S+× Obs+ such that (i)|x| = |obs|, (ii) x0 ∈ S0, (iii) for all 0≤ i ≤ |x|, xi ∈ γ(obsi), and (iv) for all 1≤ i ≤ |x|, there exists u∈ Σusuch that xi∈ Postu(Postλ(obs

i−1)({xi−1})) .

Definition 9 [Winning strategy]. We say that an observation based strategy λ for a game G of imperfect information is winning if for every (x, obs)∈ Outcomeλ(G), we have last(x)∈ Enabled(λ(obs)).

Let us call an history a couple (obsk, σk−1)∈ Obs+× Σc+such that∃x ∈ S+: x0 S0 and for all 0 ≤ i ≤ k we have xi ∈ γ(obsi))and for all 0 ≤ i < k we have xi+1 ∈ PostΣu(Postσi(xi)). Let us call knowledge after an history (obsk, σk−1)the function K : Obs+× ΣC +→ 2S defined inductively as follows.

 K(obs0, σ−1) = γ(obs0)∩ S0

K(obsk, σk−1) = γ(obsk)∩ PostΣu(Postσk−1(K(obsk−1, σk−2))) for k > 0 Thus, the knowledge after an history (obsk, σk−1)is the set of states the player can be sure the game is in after this history.

The imperfect information control problem for a classC of games of imperfect in- formation is defined as follows: given a game G∈ C, determine whether there exists a winning observation based strategy for G. We define similarly the incomplete informa- tion control problem and the perfect information control problem.

Safety games. We can encode the classical safety games using our winning condition.

To show that, we first need some definitions. Given a game of imperfect information G we say that a set of state Sbis final if∀σ ∈ Σc∪ Σu: Postσ(Sb)⊆ Sb.

We say that a strategy λ is safe on a game of imperfect information G w.r.t. a final set of bad states Sb⊆ S if for every (x, obs) ∈ Outcomeλ(G)we have last(x) /∈ Sb.

The imperfect information safety control problem for a classC of games of imperfect information is defined as follows: given a two-player game G ∈ C and a final set of states Sbof G, determine whether there exists an observation based strategy λ which is safe w.r.t Sb.

Theorem 10. The imperfect information safety control problem can be reduced to the imperfect information control problem.

3.2 Using the Lattice of Antichains

We show how the lattice of antichains that we have introduced in Section 2 can be used to solve games of imperfect information by iterating a predecessor operator.

Controllable predecessors. For q∈ L, define the set of controllable predecessors of q as follows:

CPre(q) ={s ⊆ S | ∃σ ∈ Σc· ∀obs ∈ Obs · ∃s∈ q :

s⊆ Enabled(σ) ∧ PostΣu(Postσ(s))∩ γ(obs) ⊆ s}

(6)

Let us consider an antichain q ={s0, s1, . . .}. A set s belongs to CPre(q) iff (i) there is a controllable action σ that is enabled in each state of s, (ii) when the controller plays σ, any observation compatible with the next state reached by the game (after the environment has played) suffices to determine in which set siof q that next state lies1, and (iii) s is maximal .

Lemma 11. The operator CPre : L→ L is monotone for the partial ordering . Remark. The controllable predecessor operator is also monotone w.r.t. the set of ob- servations in the following sense: given a two-player game G, let CPre1(resp. CPre2) be the operator defined on the set of observations (Obs1, γ1) (resp. (Obs2, γ2)). If 2(obs) | obs ∈ Obs2} {γ1(obs) | obs ∈ Obs1}, then for any q ∈ L we have CPre1(q) CPre2(q). That corresponds to the informal statement that it is easier to control a system with more precise observations.

Theorem 12. Let G =S, S0, Σc, Σu,→, Obs, γ be a game of imperfect information.

There exists an observation based strategy winning on G if and only if {S0∩ γ(obs) | obs ∈ Obs} 

{q | q = CPre(q)}. (1)

Before proving this theorem, we give some intuition. We denote by Win the set

{q | q = CPre(q)} which is the greatest fixed point of CPre. Condition (1) states that any observation of the initial state x0 suffices to determine in which set s of Win the game has been started. Since Win is a fixed point of the controllable predecessor operator, we know that in each set s of Win we have a controllable action that can be played by the controller in every state x∈ s such that (i) the state z reached after the move of the environment lies in one of the sets s of Win whatever the environment does and, such that (ii) the set scan be determined using any observation compatible with z. Following this, there exists a winning strategy if Condition (1) holds. The other direction of the theorem is a direct consequence of Tarski’s Theorem.

Proof of Theorem 12. First, we give an effective construction of a winning strategy for G, in the form of a finite automaton. For q∈ L and σ ∈ Σc, let φ(q, σ) ={s ∈ S | s⊆ Enabled(σ) and ∀obs ∈ Obs, ∃s∈ q : PostΣu(Postσ(s))∩ γ(obs) ⊆ s} be the set of controllable predecessors of q for the action σ. From the greatest fixed point Win of CPre, we define the finite state automaton A =Q, q0,L, δ where

– Q = Win∪ {q0} where q0∈ Win,/ – q0is the initial state,

L : Q\{q0} → Σcis a labeling of the states. For each s∈ Win, we choose σ ∈ Σc such that s∈ φ(Win, σ) and we fix L(s) = σ (such a σ exists since Win is a fixed point of CPre).

– δ : Q× Obs → Q is a transition function.

1The quantification over obs is universal since for observations that are incompatible with the new state, the condition holds trivially.

(7)

• For each obs ∈ Obs, choose s ∈ Win such that S0 ∩ γ(obs) ⊆ s and fix δ(q0, obs) = s;

• For each s ∈ Win and obs ∈ Obs, choose s ∈ Win such that PostΣu(Postσ(s))∩ γ(obs) ⊆ swhere σ =L(s) and fix δ(s, obs) = s. Such sets s, sexist by condition (1).

In this automaton, states are labelled with actions and transitions are labelled with observations. Intuitively, a state s of A corresponds to the minimal knowledge that is sufficient to control the system and the labelL(s) is a winning move the controller can play having this knowledge. The next state sis determined by the observation obs according to the transition relation.

Let ˆδ : Q×Obs+→ Q be an extension of the transition function δ on words defined recursively by ˆδ(s, obs) = δ(s, obs)and ˆδ(s, obs.obs) = δ(ˆδ(s, obs), obs).

The strategy defined by A is λ : Obs+ → Σc such that λ(obs) = L(s) if ˆδ(q0, obs) = s. If for some obs there is no s such that ˆδ(q0, obs) = s, then the sequence of observations obs is impossible. In this case, we can set λ(obs) to any value.

Now we proceed with the proof of the theorem.

– If (1) holds. We show that the strategy λ defined by A is such that for any (x, obs) ∈ Outcomeλ(G), we have (i) last(x) ∈ ˆδ(q0, obs) and (ii) last(x) Enabled(λ(obs))(thus λ is winning) . We show this by induction on the length of x and obs.

1. |x| = 1. We have x = x0and obs = obs0with x0 ∈ S0and x0 ∈ γ(obs0).

Let s = ˆδ(q0, obs0)and σ =L(s) = λ(obs0). By construction of A, we have S0∩ γ(obs0)⊆ s and s ∈ Win.

As x0∈ s and Win is a fixed point of CPre, we have (i) last(x) ∈ ˆδ(q0, obs0) and (ii) x0∈ Enabled(λ(obs0)).

2. |x| > 0. We have x = x0, x1, . . . , xk and obs = obs0, obs1, . . . , obsk

with xk ∈ γ(obsk). Let sk−1 = ˆδ(q0, obsk−1) and σ = L(sk−1) = λ(obsk−1).

By the induction hypothesis, we have xk−1 ∈ sk−1. For obs = obsk, let sk = δ(sk−1, obs). By construction of A, we have sk ∈ Win and PostΣu(Postσ(sk−1))∩ γ(obs) ⊆ sk. Therefore, we have xk ∈ sk and by definition ofL, we have sk ⊆ Enabled(σ)where σ = L(sk) = λ(obsk).

This yields (i) last(x)∈ ˆδ(q0, obs)and (ii) xk ∈ Enabled(λ(obsk)).

– If λ is an observation based strategy that is winning on G. We must show that (1) holds. Let Vλ⊆ 2S× Obs+be the smallest set (w.r.t. to⊆) such that:

• (S0∩ γ(obs), obs) ∈ Vλfor every obs∈ Obs, and

• if (s, obs) ∈ Vλ then (PostΣu(Postλ(obs)(s))∩ γ(obs), obs.obs) ∈ Vλ for every obs∈ Obs.

Let Wλ ={s | (s, obs) ∈ Vλ}. Let us show that Wλ CPre(Wλ). By Lemma 1, it suffices to show that Wλ ⊆ CPre(Wλ) . Let (s, obs) ∈ Vλwith obs = obs0, obs1, . . . , obskand let us show that s∈ CPre(Wλ).

By definition of Vλ, there exist s0, s1, . . . , sk such that s0 = S0∩ γ(obs0), sk = s, and for each 1 ≤ i ≤ k: si = PostΣu(Postσi(si−1))∩ γ(obsi)with σi = λ(obs0obs1. . . obsi−1). For any sequence of states x = x0, x1, . . . , xk

(8)

with xi ∈ si and (xk, obsk) ∈ Outcomeλ(G), since λ is winning on G, we have xk ∈ Enabled(λ(obs)) and thus s ⊆ Enabled(λ(obs)). Also we have PostΣu(Postλ(obs)(s))∩ γ(obs) ∈ Wλfor every obs∈ Obs by construction of Vλ. This entails that s∈ CPre(Wλ) , showing that Wλ CPre(Wλ), that is CPre is extensive at Wλand by the Tarski’s fixed point Theorem Wλ Win. The conclu- sion follows since{S0∩ γ(obs) | obs ∈ Obs} ⊆ Wλ. 

4 Games with Finite State Space

In this section we show that computing the greatest fixed point of CPre for finite state games can be done in EXPTIME. We also compare our algorithm based on the lattice of antichains with the classical technique of [Rei84].

4.1 Fixed Point Algorithm

To compute the greatest fixed point of CPre, we iterate CPre from S using Algorithm 1.

This algorithm constructs systematically subsets of S and checks at line l whether they belong to CPre(q). This is done by treating all subsets of size i before the subsets of size i− 1, so we avoid to treat the subsets of the already included subsets and the result is in reduced form. Therefore, Algorithm 1 uses the following operator Children(s) = {s\{x} | x ∈ s} which returns the subsets of s of cardinality |s| − 1.

Lemma 13. Algorithm 1 computes CPre in EXPTIME in the size of the game.

Lemma 14. An ascending (or descending) chain in L, , ,

,⊥ ,  has at most 2n+ 1 elements where n =|S|.

Algorithm 1. Algorithm forCPre .

Data : A game of imperfect informationG = S, S0, Σc, Σu, →, Obs, γ and a set q ∈ L.

Result : The setZ = CPre(q).

begin

1 Z ← ∅ ;

2 Wait ← {S} ;

3 whileWait = ∅ do

4 Picks ∈ Wait of maximal cardinality ;

5 Wait ← Wait\{s} ;

6 if for someσ ∈ Σcwe have : (1)s ⊆ Enabled(σ) and

(2) for all obs ∈ Obs, there exists s ∈ q such that PostΣu(Postσ(s)) ∩ γ(obs) ⊆ s

then

7 Z ← Z ∪ {s} ;

else

8 Wait ← Wait ∪ {s| s∈ Children(s) ∧ ∀s∈ Z ∪ Wait : s⊆ s} ;

9 returnZ;

end

(9)

Theorem 15. The imperfect information control problem is EXPTIME-complete.

Proof. We first prove the upper bound. From Lemma 14 and since CPre is monotone, we reach the greatest fixed point Win after at most O(2n)iterations of CPre. From Lemma 13 computing CPre can be done in EXPTIME. The conclusion follows. For the lower bound, since we solve a more general problem than Reif [Rei84], we have the

EXPTIME-hardness. 

4.2 Example

Consider the two-player game G1 on Fig. 1 with state space S = {1, 1, 2, 2, 3, 3, Bad}, initial state S0 = {2, 3}, actions Σc = {a, b} and Σu = {u}. The obser- vation set is Obs = {obs1, obs2} with γ(obs1) = {1, 1, 2, 2, Bad} and γ(obs2) = {1, 1, 3, 3}.

For the controller, the goal is to avoid state Bad in which there is no controllable action. So the controller must play an a in state 1 and 3 and a b in state 2. However the controller cannot distinguish 1 from 2 using only the current observation. Thus, to dis- criminate those states, the controller has to rely on its memory of the past observations.

We show below the iterations of the fixed point algorithm and the construction of the strategy. The fixed point computation starts from = {S}. Each set is paired with an action that can be played in all the states of that set:

S1= CPre({S}) = {{1, 2, 3}a} S2= CPre(S1) ={{2}b,{1, 3}a} S3= CPre(S2) ={{1}a,{2}b,{3}a} S4= CPre(S3) = S3

1 1

2 2

3

3 a Bad

b u

u

a b u

a b

u

u obs1

obs2

Fig. 1. A two-player game G1with observation set{obs1, obs2}

(10)

Since S4 = S3, we have Win = S3 = {{1}, {2}, {3}}. The existence of a winning strategy is established by condition (1) of Theorem 12 since the sets S0∩γ(obs1) ={2}

and S0∩ γ(obs2) ={3} are dominated in Win.

From the fixed point, using the construction given in the proof of Theorem 12, we construct the automaton of Fig. 2 which encodes a winning strategy. Indeed, when the game starts the control is either in state 2 if the given observation is obs1or in state 3 if the given observation is obs2. In the first case, the controller plays b and in the second case, it plays a. Then the game lies in state 1. According to the strategy automaton, the controller plays an a and receives a new observation that allows it to determine if the game lies now in state 2 (obs1) or in state 3 (obs2). From there, the controller can clearly iterate this strategy.

q0

b {2}

a {1}

a {3}

obs1 obs2

obs1

obs2

obs1

obs2

obs1

obs2

Fig. 2. A finite state automaton A defining a winning strategy for G1

4.3 Comparison with the Classical Technique of [Rei84]

In [Rei84] the author gives an algorithm to transform a game of incomplete information Ginto a game Gof perfect information on the histories of G.

The idea can be expressed as follows : given a game of incomplete information G =

S, S0, Σc, Σu,→1, Obs, γ define a two-player game G =S, S0, Σc,{ε}, →2 as follows: Sis the set of knowledges K(obsk, σk−1)such that (obsk, σk−1)is an history of G. S0 is the set of knowledges{K(obs0)|γ(obs0)∩ S0= ∅}. Finally the transition relation2 is defined as follows: K(obsk, σk−1) −→σk 2 K(obsk+1, σk)and s −→ε2 s for all s ∈ S. To obtain the final game of perfect information G, equip Gwith the set of observation (S, γI)where γIis the identity function. Solving the resulting game of perfect information Grequires linear time in the size of S but there exist games of incomplete information G requiring the construction of a game of perfect information of size exponentially larger than the size of G.

As our algorithm does not require this determinization, it is easy to find families of games where our method is exponentially faster than Reif’s algorithm. This is formal- ized in the next theorem.

Theorem 16. There exist finite state games of incomplete information for which the al- gorithm of [Rei84] requires an exponential time where our algorithm needs only poly- nomial time.

(11)

5 Control with Imperfect Information of Rectangular Automata

In this section, we introduce the notion of infinite games with finite stable quotient.

We use this notion to show that the discrete control problem for games of imperfect in- formation defined by rectangular automata is decidable. This result extends the results in [HK99].

5.1 Games with Finite R-Stable Quotient

Here we drop the assumption that S is finite and we consider the case where there exists a finite quotient of S over which the game is stable. We obtain a general decidability result for games of imperfect information with finite stable quotients.

Let R = {r1, r2, . . . , rl} be a finite partition of S. A set s ⊆ S is R-definable if s =

r∈Zrfor some Z ⊆ R. An antichain q ∈ L is R-definable if for every s ∈ q, sis R-definable.

Definition 17 [R-stable]. A game of imperfect informationS, S0, Σc, Σu,→, Obs, γ

is R-stable if for every σ∈ Σcthe following conditions hold:

(i) Enabled(σ)is R-definable;

(ii) for every r∈ R, PostΣu(Postσ(r))is R-definable;

(iii) for any r, r ∈ R, if for some x ∈ r and u ∈ Σu, Postu(Postσ)({x}) ∩ r = ∅ then for any x∈ r, there exists u ∈ Σusuch that PostΣu(Postσ)({x})∩r= ∅;

(iv) furthermore, for every obs∈ Obs, γ(obs) is R-definable.

The next lemma states properties of R-stable games of imperfect information. They are useful for the proof of the next theorem.

Lemma 18. Let G =S, S0, Σc, Σu,→, Obs, γ be a R-stable game of imperfect in- formation. Let s, s, s ⊆ S and r ∈ R such that (i) s and s are R-definable and (ii) s∩ r = ∅. If there exists σ ∈ Σc such that (iii) s ⊆ Enabled(σ) and (iv) PostΣu(Postσ(s))∩ s ⊆ sthen (v) r ⊆ Enabled(σ) and (vi) PostΣu(Postσ(s∪ r))∩ s ⊆ s.

Theorem 19. Let G = S, S0, Σc, Σu,→, Obs, γ be a R-stable game of imperfect information. The greatest fixed point of CPre is a R-definable antichain and is com- putable.

Proof. We show that for any R-definable antichain q ∈ L, the antichain CPre(q) is also R-definable. Let s ∈ CPre(q). For any r ∈ R such that s ∩ r = ∅, we have by Lemma 18 that s∪ r ∈ CPre(q). Since s ⊆ s ∪ r, we must have s = s ∪ r. This shows that s is R-definable. The number of R-definable antichains is finite, and so, using Tarski’s theorem, we can compute the greatest fixed point of CPre in a finite number of

iterations. 

(12)

5.2 Rectangular Automata

We first recall the definition of rectangular automata and we define their associated game semantics. We recall a result of [HK99] that establishes the existence of a finite bisimulation quotient for this game semantics.

Let X = {x1, . . . , xn} be a set of real-valued variables. A rectangular inequality over X is a formula of the form xi∼ c, where c is an integer constant, and ∼ is one of the following: <,≤, >, ≥. A rectangular predicate over X is a conjunction of rectan- gular inequalities. The set of all rectangular predicates over X is denoted Rect(X). The rectangular predicate φ defines the set of vectors [[φ]]= {y ∈ Rn|φ[X := y] is true}.

For 1 ≤ i ≤ n, let [[φ]]i be the projection on variable xi of the set [[φ]]. A set of the form [[φ]], where φ is a rectangular predicate, is called a rectangle. Given a nonnegative integer m ∈ N, the rectangular predicate φ and the rectangle [[φ]] are m-bounded if

|c| ≤ m for every conjunct xi∼ c of φ. Let us denote Rectm(X)the set of m-bounded rectangular predicate on X.

Definition 20 [Rectangular automaton]. A rectangular automaton H is a tuple

Loc, Lab, Edg, X, Init, Inv, Flow, Jump where:

– Loc ={1, . . . , m} is a finite set of locations;

– Lab is a finite set of labels;

– Edg⊆ Loc × Lab × Loc is a finite set of edges;

– X ={x1, . . . , xn} is a finite set of variables;

– Init : Loc→ Rect(X) gives the initial condition Init() of location . The automa- ton can start in  with an initial valuation v lying in [[Init()]];

– Inv : Loc→ Rect(X) gives the invariant condition Inv() of location . The au- tomaton can stay in  as long as the values of its variables lie in [[Inv()]];

– Flow : Loc→ Rect( ˙X)governs the evolution of the variables in each location.

– Jump maps each edge e ∈ Edg to a predicate Jump(e) of the form φ ∧ φ

i /∈Update(e)(xi= xi), where φ∈ Rect(X) and φ ∈ Rect(X)and Update(e)⊆ {1, . . . , n}. The variables in Xrefer to the updated values of the variables after the edge has been traversed. Each variable xiwith i∈ Update(e) is updated nondeter- ministically to an arbitrary new value in the interval [[φ]]i.

A rectangular automaton is m-bounded if all its rectangular constraints are m-bounded.

Definition 21 [Nondecreasing and bounded variables]. Let H be a rectangular automa- ton, and let i∈ {1, . . . , n}. The variable xiof H is nondecreasing if for every control mode  ∈ Loc, the invariant interval [[Inv()]]i and the flow interval [[Flow()]]i are subsets of the nonnegative reals. The variable xiis bounded if for every control mode

∈ Loc, the invariant interval [[Inv()]]iis a bounded set. The automaton H has nonde- creasing (resp. bounded; nondecreasing or bounded) variables if all n variables of H are nondecreasing (resp. bounded; either nondecreasing or bounded).

In the sequel, all the rectangular automata that we consider are assumed to be with nondecreasing or bounded variables.

We now associate a game semantics to each rectangular automaton.

(13)

Definition 22 [Discrete game semantics of rectangular automata]. The game semantics of a rectangular automaton H =Loc, Lab, Edg, X, Init, Inv, Flow, Jump is the game [[H]]= S, S0, Σc, Σu,→ where S = Loc × Rn is the state space (with n = |X|), S0 = {(, v) ∈ S | v ∈ [[Init()]]} is the initial space, Σc = Lab, Σu ={1} and → contains all the tuples ((, v), σ, (, v))such that:

– either there exists e = (, σ, )∈ Edg such that (v, v)∈ [[Jump(e)]],

– or  = and σ = 1 and there exists a continuously differentiable function f : [0, 1] →[[Inv()]] such that f(0) = v, f(1) = v and for all t ∈ (0, 1): ˙f(t) ∈ [[Flow()]].

Games constructed from rectangular automata are played as follows. The game is started in a location  with a valuation v for the continuous variables such that v ∈[[Init()]]. At each round, the controller decides to take one of the enabled edges if one exists. Then the environment updates the continuous variables by letting time elapse for 1 time unit as specified by the (nondeterministic) flow predicates. A new round is started from there. As for the games that we have considered previously, the goal of the controller is to avoid to reach states where he does not have an enabled transition to propose.

The next definition recalls the notion of bisimulation.

Definition 23 [Bisimulation]. A simulation on the game G =S, S0, Σc, Σu,→ is a binary relation∼ on the state set S such that s1 ∼ s2implies that∀σ ∈ Σc∪ Σu, if s1

−→ sσ 1then there exists s2such that s2

−→ sσ 2and s1 ∼ s2. Such a relation is called a bisimulation if it is symmetric.

We consider the following equivalence relation between states of rectangular automata.

Definition 24 . Given the game semantics [[H]]=S, S0, Lab,{1}, → of a m-bounded rectangular automaton H, define the equivalence relation≈mon S by (, v)≈m(, v) iff  = and for all 1≤ i ≤ n either vi = vi and vi = vi or both viand viare greater than m. Let us call Rmthe set of equivalence classes ofmon S.

The next lemma states that the number of equivalence classes for this relation is finite for any rectangular automata.

Lemma 25. [HK99] Let H be a m-bounded rectangular automaton. The equivalence relation≈mis the largest bisimulation of the game semantics [[H]].

5.3 Control of Rectangular Automata with Imperfect Information

We are now in position to extend the result of [HK99] to the case of imperfect information.

Given H =Loc, Lab, Edg, X, Init, Inv, Flow, Jump, a m-bounded rectangular au- tomaton, we say that the observation set (Obs, γ) is m-bounded if for each obs∈ Obs, γ(obs)is definable as a finite union of sets of the form{(l, v) | v ∈ g} where g is m-bounded rectangle.

(14)

Theorem 26. For any m-bounded rectangular automaton H with game semantics [[H]]=S, S0, Σc, Σu,→, for any m-bounded observation set (Obs, γ), the game of imperfect informationS, S0, Σc, Σu,→, Obs, γ is Rm-stable.

As corollary of Theorem 19 and Theorem 26, we have that:

Corollary 1. The discrete control problem for games of imperfect information defined by m bounded rectangular automata and m-bounded observation sets is decidable (in 2EXPTIME).

So far, we do not have a hardness result but we conjecture that the problem is 2EXPTIME-complete. Now, let us illustrate the discrete control problem for games of imperfect information defined by rectangular automata on an example.

Example. We have implemented our fixed point algorithm using HYTECHand its script language [HHWT95]. We illustrate the use of the algorithm on a simple example. Fig. 3 shows a rectangular automaton with four locations and one continuous variable x.

In this example, the game models a cooling system that controls the temperature x. When requested to start, the system begins to cool down. There are two modes of cooling, either fast or slow, among which the environment chooses. The controller can only observe the system through two observations: H with γ(H) ={(, x) | x ≥ 280}

and L with γ(L) ={(, x) | x ≤ 285}. Thus, only the continuous variable x can be observed imperfectly, not the modes. Depending on the mode however, the timing and action to stop the system are different. In the slow mode, the controller has to issue an action a when the temperature is below 280. In the fast mode, the controller has to issue an action b when the temperature is below 270.

The controller must use its memory of the past observations to make the correct action in time. If the first two observations are H, H then the controller knows that the

Start

˙ x = 0 x = 300 x = 300

Slow

˙

x∈ [−10, −9]

x∈ [250, 300]

Fast

˙

x∈ [−30, −25]

x∈ [210, 300]

Stop

˙ x = 0 a

a

x≥ 260

x≥ 260

a x≤ 280

b x≤ 270

Fig. 3. A rectangular automaton modeling a cooling system

(15)

a

b

a

q0 0

1 3

4 5

6

7 H

H

L H

L L

L L H

L H

L H

Fig. 4. A finite state automaton defining a winning strategy for the cooling system

mode is Slow. If the first two observations are H, L then the controller knows that the mode is Fast.

The greatest fixed point, given below, allows the computation of the deterministic strategy depicted in Fig. 4. The whole process has been automated in HYTECH. The correspondence between state numbers in the figure and states of the fixed point is the following:

– State 0≡ (Stop, x = 0), (Slow, 295 < x ≤ 300) – State 1≡ (Slow, 270 ≤ x ≤ 300)

– (Not depicted) State 2≡ (Slow, 295 < x ≤ 300), (Fast, 290 ≤ x ≤ 300) – State 3≡ (Slow, 260 ≤ x ≤ 289), (Slow, 295 < x ≤ 300)

– State 4≡ (Slow, 295 < x ≤ 300), (Fast, 260 ≤ x ≤ 295) – State 5≡ (Start, x = 300)

– State 6≡ (Slow, 250 ≤ x ≤ 280) – State 7≡ (Fast, 210 ≤ x ≤ 270)

As before, the strategy associates an action to each set of the fixed point and the observations give the next state of the strategy.

References

[ACH+95] R. Alur, C. Courcoubetis, N. Halbwachs, T.A. Henzinger, P.-H. Ho, X. Nicollin, A. Olivero, J. Sifakis, and S. Yovine. The algorithmic analysis of hybrid systems.

Theoretical Computer Science, 138:3–34, 1995.

[AD94] Rajeev Alur and David L. Dill. A theory of timed automata. Theoretical Computer Science, 126(2):183–235, 1994.

[AHK02] R. Alur, T.A. Henzinger, and O. Kupferman. Alternating-time temporal logic.

Journal of the ACM, 49:672–713, 2002.

[CC77] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238–252, 1977.

[DDR06] M. De Wulf, L. Doyen, and J.-F. Raskin. A lattice theory for solving games of imperfect information (extended version). Technical Report 58, U.L.B. – Federated Center in Verification, 2006. http://www.ulb.ac.be/di/ssd/cfv/publications.html.

(16)

[HHWT95] T.A. Henzinger, P.-H. Ho, and H. Wong-Toi. A user guide to HYTECH. In TACAS 95: Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science 1019, pages 41–71. Springer-Verlag, 1995.

[HK99] T.A. Henzinger and P.W. Kopke. Discrete-time control for rectangular hybrid au- tomata. Theoretical Computer Science, 221:369–392, 1999.

[MPS95] O. Maler, A. Pnueli, and J. Sifakis. On the synthesis of discrete controllers for timed systems. In STACS’95, volume 900 of Lecture Notes in Computer Science, pages 229–242. Springer, 1995.

[Rei84] John H. Reif. The complexity of two-player games of incomplete information.

Journal of Computer and System Sciences, 29(2):274–301, 1984.

參考文獻

相關文件

In an oilre nery a storage tank contains 2000 gallons of gasoline that initially has 100lb of an additive dissolved in it. In preparation for winter weather, gasoline containing 2lb

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

39) The osmotic pressure of a solution containing 22.7 mg of an unknown protein in 50.0 mL of solution is 2.88 mmHg at 25 °C. Determine the molar mass of the protein.. Use 100°C as

The molal-freezing-point-depression constant (Kf) for ethanol is 1.99 °C/m. The density of the resulting solution is 0.974 g/mL.. 21) Which one of the following graphs shows the

The difference in heights of the liquid in the two sides of the manometer is 43.4 cm when the atmospheric pressure is 755 mm Hg.. 11) Based on molecular mass and dipole moment of

Dublin born Oscar Wilde (1854-1900) established himself as one of the leading lights of the London stage at the end of the nineteenth century?. A poet and prose writer as well, it

We show next that quantum trajectory theory or conditional, stochastic density matrix con- tains the most (all) information as far as the system evolution is concerned, and the

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;