Computational Solution of DPFEs - 123 Dynamic Programming

In this section, we elaborate on how to solve a DPFE. One way in which a DPFE can be solved is by using a “conventional” procedural programming language such as Java. In Sect. 1.2.1, a Java program to solve Example 1.1 is given as an illustration.

1.2.1 Solution by Conventional Programming

A simple Java program to solve Example 1.1 is given here. This program was intentionally written as quickly as possible rather than with great care to reﬂect what a nonprofessional programmer might produce. A central theme of this book is to show how DP problems can be solved with a minimum amount of programming knowledge or eﬀort. The program as written ﬁrst solves the DPFE (1.23) [Method S] recursively. This is followed by an iterative procedure to reconstruct the optimal policy. It should be emphasized that this program does not generalize easily to other DPFEs, especially when states are sets rather than integers.

class dpfe {

public static double[][] b= {

{ 999., .2, .5, .3, 999., 999., 999., 999.}, { 999., 999., 999., 999., 1., .6, 999., 999.}, { 999., 999., 999., 999., .4, 999., .6, 999.}, { 999., 999., 999., 999., 999., .4, 1., 999.}, { 999., 999., 999., 999., 999., 999., 999., .9}, { 999., 999., 999., 999., 999., 999., 999., 1.5}, { 999., 999., 999., 999., 999., 999., 999., .6}, { 999., 999., 999., 999., 999., 999., 999., 999.}

} ; //branch distance array

public static int N = b.length; //number of nodes public static int[] ptr = new int[N]; //optimal decisions public static double fct(int s) {

double value=999.; ptr[s]=-1;

if (s==N-1) {value=0.0; } // target state else

for (int d=s+1; d<N; d++) // for s>d

if (b[s][d]<999.) // if d=succ(s)

if (value>b[s][d]+fct(d)) // if new min { value=b[s][d]+fct(d); ptr[s]=d; } //reset return value;

} //end fct

public static void main(String[] args) {

System.out.println("min="+fct(0)); //compute goal int i=0; System.out.print("path:"+i);

while (ptr[i]>0) { //reconstruction

System.out.print("->"+ptr[i]);

i=ptr[i];

}

} // end main } // end dpfe

A recursive solution of the DPFE was chosen because the DPFE itself is a recursive equation, and transforming it to obtain an iterative solution is not a natural process. Such a transformation would generally take a signiﬁcant amount of eﬀort, especially for nonserial problems. On the other hand, a ma-jor disadvantage of recursion is the ineﬃciency associated with recalculating f (S) for states S that are next-states of many other states. This is analogous to the reason it is preferable to solve the Fibonacci recurrence relation iter-atively rather than recursively. Although ﬁnding an iterative solution for the Fibonacci problem is easy, and it also happens to be easy for the linear search problem, in general we cannot expect this to be the case for DP problems.

1.2.2 The State-Decision-Reward-Transformation Table

This book will describe an alternative to conventional programming, as illus-trated above, based on the ability to automatically generate the state space for a given DPFE. Recall that, for Example 1.1, the state space is the set

{{a, b, c}, {b, c}, {a, c}, {a, b}, {c}, {b}, {a}, ∅}

{{0, 1, 2}, {1, 2}, {0, 2}, {0, 1}, {2}, {1}, {0}, ∅}

if we give the decisions a, b, and c the numerical labels 0,1,2 instead.

The state space for a given DPFE can be generated in the process of pro-ducing the State-Decision-Reward-Transformation (SDRT) table. The SDRT table gives, for each state S and for each decision d in the decision space D(S), the reward function R(S, d) and the transformation function(s) T (S, d) for each pair (S, d), starting from the goal state S^∗. T (S, d) allows us to gen-erate next-states. For Example 1.1, the SDRT table is given in Table 1.1.

As each next-state S is generated, if it is not already in the table, it is added to the table and additional rows are added for each of the decisions in D(S). If a base-state is generated, which has no associated decision, no additional rows are added to the table.

Given the SDRT table, for a serial DPFE, we can easily construct a state transition system model whose nodes are the states. For Example 1.1, the (Boolean) adjacency matrix for this state transition model is as follows:

Table 1.1. SDRT Table for Linear Search Example state decision reward next-states {0, 1, 2} d = 0 0.2 ({1, 2}) {0, 1, 2} d = 1 0.5 ({0, 2}) {0, 1, 2} d = 2 0.3 ({0, 1}) {1, 2} d = 1 1.0 ({2}) {1, 2} d = 2 0.6 ({1}) {0, 2} d = 0 0.4 ({2}) {0, 2} d = 2 0.6 ({0}) {0, 1} d = 0 0.4 ({1}) {0, 1} d = 1 1.0 ({0})

{2} d = 2 0.9 (∅)

{1} d = 1 1.5 (∅)

{0} d = 0 0.6 (∅)

1 2 3 4 5 6 7 8 1 0 1 1 1 0 0 0 0 2 0 0 0 0 1 1 0 0 3 0 0 0 0 1 0 1 0 4 0 0 0 0 0 1 1 0 5 0 0 0 0 0 0 0 1 6 0 0 0 0 0 0 0 1 7 0 0 0 0 0 0 0 1 8 0 0 0 0 0 0 0 0

The weighted adjacency matrix whose nonzero elements are branch labels is

1 2 3 4 5 6 7 8

1 0 0.2 0.5 0.3 0 0 0 0 2 0 0 0 0 1.0 0.6 0 0 3 0 0 0 0 0.4 0 0.6 0 4 0 0 0 0 0 0.4 1.0 0 5 0 0 0 0 0 0 0 0.9 6 0 0 0 0 0 0 0 1.5 7 0 0 0 0 0 0 0 0.6 8 0 0 0 0 0 0 0 0

The row and column numbers or indices shown (1, . . . , 8) are not part of the matrix itself; in programming languages, such as Java, it is common to start indexing from zero (0, . . . , 7) instead of one.

Later in this book we show that nonserial DPFEs can be modeled in a similar fashion using a generalization of state transition systems called Petri nets.

1.2.3 Code Generation

The adjacency matrix obtained from the SDRT table associated with a DPFE, as described in Sect. 1.2.2, provides the basis for a DP program generator, i.e., a software tool that automatically generates “solver code”, speciﬁcally, a sequence of assignment statements for solving a DPFE using a conventional programming language such as Java. We illustrate this solver code generation process in this section.

Given a weighted adjacency matrix, for example, the one given above, we can obtain the numerical solution of the DPFE by deﬁning an assignment statement for each row of the matrix which sets a variable ai for row i equal to the minimum of terms of the form ci,j+ aj, where j is a successor of i.

a1=min{.2+a2,.5+a3,.3+a4}

a2=min{1.+a5,.6+a6}

a3=min{.4+a5,.6+a7}

a4=min{.4+a6,1.+a7}

a5=min{.9+a8}

a6=min{1.5+a8}

a7=min{.6+a8}

a8=0

These assignment statements can be used in a conventional nonrecursive computer program (in any procedural programming language) to calculate the values a_i. The statements should be compared with the equations of Example 1.1 [Method S]. As in that earlier example, evaluating the values a_i yields the following results: a₈ = 0, a₇ = 0.6, a₆ = 1.5, a₅ = 0.9, a₄ = min(1.9, 1.6) = 1.6, a3 = min(1.3, 1.2) = 1.2, a2 = min(1.9, 2.1) = 1.9, a1 = min(2.1, 1.7, 1.9) = 1.7; note that a1= 1.7 is the goal. These assignment state-ments must of course be “topologically” reordered, from last to ﬁrst, before they are executed.

1.2.4 Spreadsheet Solutions

Above, we showed the basis for a DP program generator that automatically generates a sequence of assignment statements for solving a DPFE using a conventional programming language. We show in this section how a spread-sheet that solves a DPFE can be automatically generated.

The assignment statements given Sect. 1.2.3 for the linear search problem can also be rewritten in the form

=min(.2+A2,.5+A3,.3+A4)

=min(1.+A5,.6+A6)

=min(.4+A5,.6+A7)

=min(.4+A6,1.+A7)

=min(.9+A8)

=min(1.5+A8)

=min(.6+A8) 0

which when imported into the ﬁrst column of a spreadsheet will yield the same results as before; cell A1 of the spreadsheet will have 1.7 as its computed an-swer. One advantage of this spreadsheet solution is that “topological” sorting is unnecessary.

In this spreadsheet program, only the lengths of the shortest paths are calculated. To reconstruct the optimal policies, i.e. the sequence of decisions that yield the shortest paths, more work must be done. We will not address this reconstruction task further in this Chapter.

The foregoing spreadsheet has formulas that involve both the minimization and addition operations. A simpler “basic” spreadsheet would permit formulas to have only one operation. Suppose we deﬁne an intermediary variable a_kfor each of the terms c_i,j + a_j. Then we may rewrite the original sequence of statements as follows:

a1=min(a9,a10,a11) a2=min(a12,a13) a3=min(a14,a15) a4=min(a16,a17) a5=min(a18) a6=min(a19) a7=min(a20) a8=0

a9=.2+a2 a10=.5+a3 a11=.3+a4 a12=1.+a5 a13=.6+a6 a14=.4+a5 a15=.6+a7 a16=.4+a6 a17=1.+a7 a18=.9+a8 a19=1.5+a8 a20=.6+a8

As above, we may also rewrite this in spreadsheet form:

=min(A9,A10,A11)

=min(A12,A13)

=min(A14,A15)

=min(A16,A17)

=min(A18)

=min(A19)

=min(A20)

=.2+A2

=.5+A3

=.3+A4

=1.+A5

=.6+A6

=.4+A5

=.6+A7

=.4+A6

=1.+A7

=.9+A8

=1.5+A8

=.6+A8

This basic spreadsheet is a tabular representation of the original DPFE, and is at the heart of the software system we describe in this book. This software automatically generates the following equivalent spreadsheet from the given DPFE:

=B1+0.9

=MIN(B2)

=B3+1

=B3+0.4

=B1+1.5

=MIN(B6)

=B7+0.6

=B7+0.4

=MIN(B4,B8)

=B10+0.2

=B1+0.6

=MIN(B12)

=B13+0.6

=B13+1

=MIN(B5,B14)

=B16+0.5

=MIN(B9,B15)

=B18+0.3

=MIN(B11,B17,B19)

(Only Column B is shown here.) The diﬀerent ordering is a consequence of our implementation decisions, but does not aﬀect the results.

1.2.5 Example: SPA

As another illustration, that we will use later in this book since it is a smaller example that can be more easily examined in detail, we consider the shortest

path in an acyclic graph (SPA) problem, introduced as Example 1.6 in Sect. 1.1.10. The SDRT table is as follows:

StateDecisionRewardTransformationTable (0) [d=1] 3.0 ((1)) ()

(0) [d=2] 5.0 ((2)) () (1) [d=2] 1.0 ((2)) () (1) [d=3] 8.0 ((3)) () (2) [d=3] 5.0 ((3)) ()

From this table, we can generate solver code as a sequence of assignment statements as follows:

A1=min(A2+3.0,A3+5.0) A2=min(A3+1.0,A4+8.0) A3=min(A4+5.0)

A4=0.0

Simplifying the formulas, so that each has only a single (minimization or addition) operation, we may rewrite the foregoing as follows:

A1=min(A5,A6) A2=min(A7,A8) A3=min(A9) A4=0.0 A5=A2+3.0 A6=A3+5.0 A7=A3+1.0 A8=A4+8.0 A9=A4+5.0

As in the case of the preceding linear search example, these assignment statements must be topologically sorted if they are to be executed as a conven-tional sequential program. (This sorting is unnecessary if they are imported into a Column A of a spreadsheet.) Rearranging the variables (letting B9=A1, B7=A2, B4=A3, etc.), we have:

B1=0.0 B2=B1+8.0 B3=B1+5.0 B4=min(B3) B5=B4+5.0 B6=B4+1.0 B7=min(B6,B2) B8=B7+3.0 B9=min(B8,B5)

These assignment statements can be executed as a conventional sequen-tial program. Alternatively, importing them into Column B, we arrive at the following spreadsheet solver code:

=0.0

=B1+8.0

=B1+5.0

=min(B3)

=B4+5.0

=B4+1.0

=min(B6,B2)

=B7+3.0

=min(B8,B5)

1.2.6 Concluding Remarks

It is not easy to modify the above Java or spreadsheet “solver code” to solve DP problems that are dissimilar to linear search or shortest paths. Conven-tional programming and hand-coding spreadsheets, especially for problems of larger dimension, are error-prone tasks. The desirability of a software tool that automatically generates solver code from a DPFE is clear. That is the focus of this book.

在文檔中 123 Dynamic Programming (頁 47-55)