## 123

### Holger Mauch

### Dynamic Programming

### A Computational Tool

### With 55 Figures and 5 Tables

ISSN electronic edition: 1860-9503

This work is subject to copyright. All rights are reserved, whether the whole or part of the mate- rial is concerned, specifically the rights of tranjj py gpy g gg

slation, reprinting, reuse of illustrations, recita-p tion, broadcasting, reproduction on microfilm or in any other way, and storage in data banks.

p y g

p y g p gg

Duplication of this publicationgg pp

or parts thereof is permitted onyy

ly under the provisions of theyy gg German Copyright Law of Septemp pp

ber 9, 1965, in its current version, and permission for use

p p

p y pp

must always be obtained from Springer-Verlag. Violations are liable to prosecution under the

py g p

py g pp

German Copyright Law. y

Springer is a part of Springer Science+Business Media springer.com

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from

g p g

g p g pp

the relevant protective laws and regulations and therefore free for general use. p yp y pp

5 4 3 2 1 0 Cover design: deblik, Berlin

ISSN print edition: 1860-949X

Typesetting by the authors and SPig

89/SPi Library of Congress Control Number: 2006930743

ISBN-10 3-540-37013-7 Springer Berlin Heidelberg New York ISBN-13 978-3-540-37013-0 Springer Berlin Heidelberg New York

Printeddd on acid-ffree paper SPIN: 11550860d

Computer Sciences Department of Information

University of Hawaii at Manoa 96822

Honolulu, HI USA

E-mail: artlew@hawaii.edu

Department of Computer Science Natural Sciences Collegium Eckerd College

USA

33711 Saint Petersburg, FL

E-mail: mauchh@eckerd.edu

©

©Springer-Verlag Berlin Heidelberg 2007p gg gg and

4200, 54th Ave. S.

1680 East-West Road

### To my family. H.M.

Dynamic programming has long been applied to numerous areas in mathe- matics, science, engineering, business, medicine, information systems, bio- mathematics, artiﬁcial intelligence, among others. Applications of dynamic programming have increased as recent advances have been made in areas such as neural networks, data mining, soft computing, and other areas of compu- tational intelligence. The value of dynamic programming formulations and means to obtain their computational solutions has never been greater.

This book describes the use of dynamic programming as a computational tool to solve discrete optimization problems.

(1) We ﬁrst formulate large classes of discrete optimization problems in
dynamic programming terms, speciﬁcally by deriving the dynamic program-
ming functional equations (DPFEs) that solve these problems. A text-based
*language, gDPS, for expressing these DPFEs is introduced. gDPS may be*
regarded as a high-level speciﬁcation language, not a conventional procedural
computer programming language, but which can be used to obtain numerical
solutions.

(2) We then deﬁne and examine properties of Bellman nets, a class of Petri nets that serves both as a formal theoretical model of dynamic programming problems, and as an internal computer data structure representation of the DPFEs that solve these problems.

(3) We also describe the design, implementation, and use of a software tool,
*called DP2PN2Solver, for solving DPFEs. DP2PN2Solver may be regarded as*
a program generator, whose input is a DPFE, expressed in the input speciﬁ-
cation language gDPS and internally represented as a Bellman net, and whose
output is its numerical solution that is produced indirectly by the generation
of “solver” code, which when executed yields the desired solution.

This book should be of value to diﬀerent classes of readers: students, in- structors, practitioners, and researchers. We ﬁrst provide a tutorial intro- duction to dynamic programming and to Petri nets. For those interested in dynamic programming, we provide a useful software tool that allows them to obtain numerical solutions. For researchers having an interest in the ﬁelds of

dynamic programming and Petri nets, unlike most past work which applies dynamic programming to solve Petri net problems, we suggest ways to apply Petri nets to solve dynamic programming problems.

For students and instructors of courses in which dynamic programming is taught, usually as one of many other problem-solving methods, this book provides a wealth of examples that show how discrete optimization problems can be formulated in dynamic programming terms. Dynamic programming has been and continues to be taught as an “art”, where how to use it must be learned by example, there being no mechanical way to apply knowledge of the general principles (e.g., the principle of optimality) to new unfamiliar problems. Experience has shown that the greater the number and variety of problems presented, the easier it is for students to apply general concepts.

Thus, one objective of this book is to include many and more diverse examples.

A further distinguishing feature of this book is that, for all of these examples, we not only formulate the DP equations but also show their computational solutions, exhibiting computer programs (in our speciﬁcation language) as well as providing as output numerical answers (as produced by the automatically generated solver code).

In addition, we provide students and instructors with a software tool (DP2PN2Solver) that enables them to obtain numerical solutions of dynamic programming problems without requiring them to have much computer pro- gramming knowledge and experience. This software tool can be downloaded from either of the following websites:

http://natsci.eckerd.edu/∼mauchh/Research/DP2PN2Solver
http://www2.hawaii.edu/*∼icl/DP2PN2Solver*

Further information is given in Appendix B. Having such software support allows them to focus on dynamic programming rather than on computer pro- gramming. Since many problems can be solved by diﬀerent dynamic program- ming formulations, the availability of such a computational tool, that makes it easier for readers to experiment with their own formulations, is a useful aid to learning.

The DP2PN2Solver tool also enables practitioners to obtain numerical solutions of dynamic programming problems of interest to them without requiring them to write conventional computer programs. Their time, of course, is better spent on problem formulation and analysis than on program design and debugging. This tool allows them to verify that their formulations are correct, and to revise them as may be necessary in their problem solving eﬀorts. The main limitation of this (and any) dynamic programming tool for many practical problems is the size of the state space. Even in this event, the tool may prove useful in the formulation stage to initially test ideas on simpliﬁed scaled-down problems.

As a program generator, DP2PN2Solver is ﬂexible, permitting alternate front-ends and back-ends. Inputs other than in the gDPS language are possi- ble. Alternative DPFE speciﬁcations can be translated into gDPS or directly

into Bellman nets. Output solver code (i.e., the program that numerically solves a given DPFE) may be in alternative languages. The solver code emphasized in this book is Java code, largely because it is universally and freely available on practically every platform. We also discuss solver codes for spreadsheet systems and Petri net simulators. By default, the automatically generated solver code is hidden from the average user, but it can be inspected and modiﬁed directly by users if they wish.

Furthermore, this book describes research into connections between dynamic programming and Petri nets. It was our early research into such connections that ultimately lead to the concept of Bellman nets, upon which the development of our DP2PN2Solver tool is based. We explain here the underlying ideas associated with Bellman nets. Researchers interested in dy- namic programming or Petri nets will ﬁnd many open questions related to this work that suggest avenues of future research. For example, additional research might very likely result in improvements in the DP2PN2Solver tool, such as to address the state-space size issue or to increase its diagnostic capabilities.

Every other aspect of this work may beneﬁt from additional research.

Thus, we expect the DP2PN2Solver tool described in this book to un- dergo revisions from time to time. In fact, the tool was designed modularly to make it relatively easy to modify. As one example, changes to the gDPS speciﬁcation language syntax can be made by simply revising its BNF deﬁ- nition since we use a compiler-compiler rather than a compiler to process it.

Furthermore, alternate input languages (other than gDPS) and solver codes (other than Java) can be added as optional modules, without changing the existing modules. We welcome suggestions from readers on how the tool (or its description) can be improved. We may be contacted at artlew@hawaii.edu or mauchh@eckerd.edu. Updates to the software and to this book, including errata, will be placed on the aforementioned websites.

*Acknowledgements. The authors wish to thank Janusz Kacprzyk for*
including this monograph in his ﬁne series of books. His encouragement has
been very much appreciated.

Honolulu, June 2006, *Art Lew*

St. Petersburg, June 2006, *Holger Mauch*

**Part I Dynamic Programming**

**1** **Introduction to Dynamic Programming . . . .** 3

1.1 Principles of Dynamic Programming . . . 5

1.1.1 Sequential Decision Processes . . . 6

1.1.2 Dynamic Programming Functional Equations . . . 9

1.1.3 The Elements of Dynamic Programming . . . 11

1.1.4 Application: Linear Search . . . 12

1.1.5 Problem Formulation and Solution . . . 14

1.1.6 State Transition Graph Model . . . 17

1.1.7 Staged Decisions . . . 19

1.1.8 Path-States . . . 21

1.1.9 Relaxation . . . 22

1.1.10 Shortest Path Problems . . . 23

1.1.11 All-Pairs Shortest Paths . . . 29

1.1.12 State Space Generation . . . 30

1.1.13 Complexity . . . 31

1.1.14 Greedy Algorithms . . . 32

1.1.15 Probabilistic DP . . . 32

1.1.16 Nonoptimization Problems . . . 33

1.1.17 Concluding Remarks . . . 34

1.2 Computational Solution of DPFEs . . . 34

1.2.1 Solution by Conventional Programming . . . 35

1.2.2 The State-Decision-Reward-Transformation Table . . . . 36

1.2.3 Code Generation . . . 38

1.2.4 Spreadsheet Solutions . . . 38

1.2.5 Example: SPA . . . 40

1.2.6 Concluding Remarks . . . 42

1.3 Overview of Book . . . 42

**2** **Applications of Dynamic Programming . . . 45**

2.1 Optimal Allotment Problem (ALLOT) . . . 49

2.2 All-Pairs Shortest Paths Problem (APSP) . . . 50

2.3 Optimal Alphabetic Radix-Code Tree Problem (ARC) . . . 51

2.4 Assembly Line Balancing (ASMBAL) . . . 52

2.5 Optimal Assignment Problem (ASSIGN) . . . 54

2.6 Optimal Binary Search Tree Problem (BST) . . . 55

2.7 Optimal Covering Problem (COV) . . . 57

2.8 Deadline Scheduling Problem (DEADLINE) . . . 57

2.9 Discounted Proﬁts Problem (DPP) . . . 58

2.10 Edit Distance Problem (EDP) . . . 59

2.11 Fibonacci Recurrence Relation (FIB) . . . 60

2.12 Flowshop Problem (FLOWSHOP) . . . 61

2.13 Tower of Hanoi Problem (HANOI) . . . 62

2.14 Integer Linear Programming (ILP) . . . 63

2.15 Integer Knapsack as ILP Problem (ILPKNAP) . . . 64

2.16 Interval Scheduling Problem (INTVL) . . . 64

2.17 Inventory Problem (INVENT) . . . 66

2.18 Optimal Investment Problem (INVEST) . . . 67

2.19 Investment: Winning in Las Vegas Problem (INVESTWLV) . . 68

2.20 0/1 Knapsack Problem (KS01) . . . 69

2.21 COV as KSINT Problem (KSCOV) . . . 70

2.22 Integer Knapsack Problem (KSINT) . . . 70

2.23 Longest Common Subsequence (LCS) . . . 71

2.24 Optimal Linear Search Problem (LINSRC) . . . 73

2.25 Lot Size Problem (LOT) . . . 73

2.26 Longest Simple Path Problem (LSP) . . . 74

2.27 Matrix Chain Multiplication Problem (MCM) . . . 75

2.28 Minimum Maximum Problem (MINMAX) . . . 75

2.29 Minimum Weight Spanning Tree Problem (MWST) . . . 77

2.30 The Game of NIM (NIM) . . . 78

2.31 Optimal Distribution Problem (ODP) . . . 80

2.32 Optimal Permutation Problem (PERM) . . . 81

2.33 Jug-Pouring Problem (POUR) . . . 82

2.34 Optimal Production Problem (PROD) . . . 83

2.35 Production: Reject Allowances Problem (PRODRAP) . . . 84

2.36 Reliability Design Problem (RDP) . . . 84

2.37 Replacement Problem (REPLACE) . . . 85

2.38 Stagecoach Problem (SCP) . . . 86

2.39 Seek Disk Scheduling Problem (SEEK) . . . 87

2.40 Segmented Curve Fitting Problem (SEGLINE) . . . 88

2.41 Program Segmentation Problem (SEGPAGE) . . . 91

2.42 Optimal Selection Problem (SELECT) . . . 94

2.43 Shortest Path in an Acyclic Graph (SPA) . . . 95

2.44 Shortest Path in an Cyclic Graph (SPC) . . . 95

2.45 Process Scheduling Problem (SPT) . . . 97

2.46 Transportation Problem (TRANSPO) . . . 98

2.47 Traveling Salesman Problem (TSP) . . . 99

**Part II Modeling of DP Problems**
**3** **The DP Speciﬁcation Language gDPS . . . 103**

3.1 Introduction to gDPS . . . 103

3.2 Design Principles of gDPS . . . 105

3.3 Detailed Description of the gDPS Sections . . . 106

3.3.1 Name Section . . . 106

3.3.2 General Variables Section . . . 106

3.3.3 Set Variables Section . . . 108

3.3.4 General Functions Section . . . 109

3.3.5 State Type Section . . . 110

3.3.6 Decision Variable Section . . . 110

3.3.7 Decision Space Section . . . 111

3.3.8 Goal Section . . . 111

3.3.9 DPFE Base Section . . . 112

3.3.10 DPFE Section . . . 113

3.3.11 Cost/Reward Function Section . . . 115

3.3.12 Transformation Function Section . . . 115

3.3.13 Transition Weight Section . . . 116

3.4 BNF Grammar of the gDPS language . . . 117

**4** **DP Problem Speciﬁcations in gDPS . . . 125**

4.1 gDPS source for ALLOT . . . 125

4.2 gDPS source for APSP . . . 128

4.3 gDPS source for ARC . . . 131

4.4 gDPS source for ASMBAL . . . 132

4.5 gDPS source for ASSIGN . . . 135

4.6 gDPS source for BST . . . 136

4.7 gDPS source for COV . . . 138

4.8 gDPS source for DEADLINE . . . 139

4.9 gDPS source for DPP . . . 140

4.10 gDPS source for EDP . . . 141

4.11 gDPS source for FIB . . . 144

4.12 gDPS source for FLOWSHOP . . . 144

4.13 gDPS source for HANOI . . . 145

4.14 gDPS source for ILP . . . 146

4.15 gDPS source for ILPKNAP . . . 148

4.16 gDPS source for INTVL . . . 150

4.17 gDPS source for INVENT . . . 154

4.18 gDPS source for INVEST . . . 156

4.19 gDPS source for INVESTWLV . . . 157

4.20 gDPS source for KS01 . . . 158

4.21 gDPS source for KSCOV . . . 159

4.22 gDPS source for KSINT . . . 160

4.23 gDPS source for LCS . . . 161

4.24 gDPS source for LINSRC . . . 165

4.25 gDPS source for LOT . . . 167

4.26 gDPS source for LSP . . . 168

4.27 gDPS source for MCM . . . 170

4.28 gDPS source for MINMAX . . . 171

4.29 gDPS source for MWST . . . 173

4.30 gDPS source for NIM . . . 176

4.31 gDPS source for ODP . . . 176

4.32 gDPS source for PERM . . . 178

4.33 gDPS source for POUR . . . 179

4.34 gDPS source for PROD . . . 181

4.35 gDPS source for PRODRAP . . . 182

4.36 gDPS source for RDP . . . 184

4.37 gDPS source for REPLACE . . . 186

4.38 gDPS source for SCP . . . 187

4.39 gDPS source for SEEK . . . 189

4.40 gDPS source for SEGLINE . . . 190

4.41 gDPS source for SEGPAGE . . . 192

4.42 gDPS source for SELECT . . . 193

4.43 gDPS source for SPA . . . 194

4.44 gDPS source for SPC . . . 196

4.45 gDPS source for SPT . . . 199

4.46 gDPS source for TRANSPO . . . 200

4.47 gDPS source for TSP . . . 201

**5** **Bellman Nets: A Class of Petri Nets . . . 205**

5.1 Petri Net Introduction . . . 205

5.1.1 Place/Transition Nets . . . 205

5.1.2 High-level Petri Nets . . . 207

5.1.3 Colored Petri Nets . . . 208

5.1.4 Petri Net Properties . . . 209

5.1.5 Petri Net Software . . . 210

5.2 Petri Net Models of Dynamic Programming . . . 210

5.3 The Low-Level Bellman Net Model . . . 212

5.3.1 Construction of the Low-Level Bellman Net Model . . . 212

5.3.2 The Role of Transitions in the Low-Level Bellman Net Model . . . 213

5.3.3 The Role of Places in the Low-Level Bellman Net Model . . . 213

5.3.4 The Role of Markings in the Low-Level Bellman Net

Model . . . 214

5.3.5 Advantages of the Low-Level Bellman Net Model . . . . 214

5.4 Low-Level Bellman Net Properties . . . 214

5.5 The High-Level Bellman Net Model . . . 215

5.6 High-Level Bellman Net Properties . . . 219

**6** **Bellman Net Representations of DP Problems . . . 221**

6.1 Graphical Representation of Low-Level Bellman Net Examples222 6.1.1 Low-Level Bellman Net for BST . . . 222

6.1.2 Low-Level Bellman Net for LINSRC . . . 222

6.1.3 Low-Level Bellman Net for MCM . . . 224

6.1.4 Low-Level Bellman Net for ODP . . . 224

6.1.5 Low-Level Bellman Net for PERM . . . 227

6.1.6 Low-Level Bellman Net for SPA . . . 228

6.2 Graphical Representation of High-Level Bellman Net Examples228 6.2.1 High-Level Bellman Net for EDP . . . 230

6.2.2 High-Level Bellman Net for ILP . . . 230

6.2.3 High-Level Bellman Net for KS01 . . . 231

6.2.4 High-Level Bellman Net for LCS . . . 231

6.2.5 High-Level Bellman Net for LINSRC . . . 234

6.2.6 High-Level Bellman Net for LSP . . . 235

6.2.7 High-Level Bellman Net for MCM . . . 236

6.2.8 High-Level Bellman Net for RDP . . . 238

6.2.9 High-Level Bellman Net for SCP . . . 238

6.2.10 High-Level Bellman Net for SPA . . . 240

6.2.11 High-Level Bellman Net for SPC . . . 242

**Part III Design and Implementation of DP Tool**
**7** **DP2PN2Solver Tool . . . 247**

7.1 Overview . . . 247

7.2 Internal Representation of Bellman Nets . . . 251

7.3 Compiling and Executing DP Programs . . . 252

7.4 The ILP2gDPS Preprocessor Module . . . 255

**8** **DP2PN Parser and Builder . . . 259**

8.1 Design of the DP2PN modules . . . 259

8.2 Implementation of the DP2PN modules . . . 260

8.3 The Module LINSRCSMain . . . 263

8.4 Error Detection in DP2PN . . . 268

**9** **The PN2Solver Modules . . . 271**

9.1 The Solver Code Generation Process . . . 271

9.2 The PN2Java Module . . . 273

9.2.1 Java Solver Code Calculation Objects . . . 274

9.2.2 Java Solver Code for LINSRCS . . . 276

9.2.3 Java Solver Code for LSP . . . 278

9.2.4 Java Solver Code for MCM . . . 278

9.2.5 Java Solver Code for SPA . . . 280

9.3 The PN2Spreadsheet Module . . . 281

9.3.1 PN2Spreadsheet Solver Code for LINSRCS . . . 282

9.3.2 PN2Spreadsheet Solver Code for Other Examples . . . . 284

9.4 The PN2XML Module . . . 284

9.4.1 Petri Net Solver Code for LINSRCS . . . 285

9.4.2 Petri Net Solver Code for SPA . . . 288

9.5 Conclusion . . . 289

**Part IV Computational Results**
**10 Java Solver Results of DP Problems . . . 293**

10.1 ALLOT Java Solver Output . . . 293

10.2 APSP Java Solver Output . . . 294

10.3 ARC Java Solver Output . . . 296

10.4 ASMBAL Java Solver Output . . . 296

10.5 ASSIGN Java Solver Output . . . 297

10.6 BST Java Solver Output . . . 297

10.7 COV Java Solver Output . . . 298

10.8 DEADLINE Java Solver Output . . . 298

10.9 DPP Java Solver Output . . . 299

10.10 EDP Java Solver Output . . . 299

10.11 FIB Java Solver Output . . . 299

10.12 FLOWSHOP Java Solver Output . . . 300

10.13 HANOI Java Solver Output . . . 300

10.14 ILP Java Solver Output . . . 301

10.15 ILPKNAP Java Solver Output . . . 301

10.16 INTVL Java Solver Output . . . 302

10.17 INVENT Java Solver Output . . . 303

10.18 INVEST Java Solver Output . . . 304

10.19 INVESTWLV Java Solver Output . . . 304

10.20 KS01 Java Solver Output . . . 305

10.21 KSCOV Java Solver Output . . . 306

10.22 KSINT Java Solver Output . . . 306

10.23 LCS Java Solver Output . . . 306

10.24 LINSRC Java Solver Output . . . 307

10.25 LOT Java Solver Output . . . 308

10.26 LSP Java Solver Output . . . 308

10.27 MCM Java Solver Output . . . 308

10.28 MINMAX Java Solver Output . . . 309

10.29 MWST Java Solver Output . . . 309

10.30 NIM Java Solver Output . . . 309

10.31 ODP Java Solver Output . . . 312

10.32 PERM Java Solver Output . . . 312

10.33 POUR Java Solver Output . . . 312

10.34 PROD Java Solver Output . . . 313

10.35 PRODRAP Java Solver Output . . . 314

10.36 RDP Java Solver Output . . . 314

10.37 REPLACE Java Solver Output . . . 315

10.38 SCP Java Solver Output . . . 315

10.39 SEEK Java Solver Output . . . 315

10.40 SEGLINE Java Solver Output . . . 316

10.41 SEGPAGE Java Solver Output . . . 316

10.42 SELECT Java Solver Output . . . 317

10.43 SPA Java Solver Output . . . 317

10.44 SPC Java Solver Output . . . 318

10.45 SPT Java Solver Output . . . 318

10.46 TRANSPO Java Solver Output . . . 319

10.47 TSP Java Solver Output . . . 319

**11 Other Solver Results . . . 321**

11.1 PN2Spreadsheet Solver Code Output . . . 321

11.1.1 PN2Spreadsheet Solver Code for LINSRCS . . . 321

11.1.2 PN2Spreadsheet Solver Code for LSP . . . 322

11.1.3 PN2Spreadsheet Solver Code for MCM . . . 322

11.1.4 PN2Spreadsheet Solver Code for SPA . . . 323

11.1.5 Spreadsheet Output . . . 323

11.2 PN2XML Solver Code Output . . . 324

11.2.1 PN2XML Simulation Output for LINSRCS . . . 325

**12 Conclusions . . . 329**

12.1 Applicability of DP and DP2PN2Solver . . . 329

12.2 The DP2PN2Solver Tool . . . 330

12.3 Research Directions . . . 332

12.3.1 User Functionality . . . 333

12.3.2 Reduction of Dimensionality . . . 334

12.3.3 Petri Net Modeling . . . 335

12.4 Summary . . . 336

**A Supplementary Material . . . 339**

A.1 Pseudocode of the DP2PN Module . . . 339

A.1.1 Main Class for LINSRCS . . . 339

A.1.2 State Class for LINSRCS . . . 342

A.1.3 Decision Class . . . 343

A.1.4 DPInstanceTableEntry Class . . . 344

A.1.5 DPInstance Class . . . 344

A.1.6 BellmanNet Class . . . 349

A.2 DP2PN System Files . . . 353

A.3 Output from PN2XML . . . 356

A.3.1 High-Level Bellman Net XML ﬁle for SPA1 . . . 356

**B** **User Guide for DP2PN2Solver . . . 359**

B.1 System Requirements for DP2PN2Solver . . . 359

B.1.1 Java Environment . . . 359

B.2 Obtaining DP2PN2Solver . . . 360

B.3 Installation of DP2PN2Solver . . . 360

B.3.1 Deployment of the Files . . . 360

B.4 Running DP2PN2Solver . . . 361

B.4.1 The DP2PN Module . . . 361

B.4.2 The PN2Solver Module . . . 363

B.5 Creation of the gDPS Source File . . . 365

B.6 Debugging gDPS Code . . . 366

B.6.1 Omission of Base Cases . . . 366

B.6.2 Common Mistakes . . . 367

B.7 Error Messages of DP2PN2Solver . . . 368

**References . . . 371**

**Index . . . 375**

**Introduction to Dynamic Programming**

*This book concerns the use of a method known as dynamic programming (DP)*
to solve large classes of optimization problems. We will focus on discrete op-
timization problems for which a set or sequence of decisions must be made to
optimize (minimize or maximize) some function of the decisions. There are of
course numerous methods to solve discrete optimization problems, many of
which are collectively known as mathematical programming methods. Our ob-
jective here is not to compare these other mathematical programming methods
with dynamic programming. Each has advantages and disadvantages, as dis-
cussed in many other places. However, we will note that the most prominent
*of these other methods is linear programming. As its name suggests, it has*
limitations associated with its linearity assumptions whereas many problems
are nonlinear. Nevertheless, linear programming and its variants and exten-
sions (some that allow nonlinearities) have been used to solve many real world
problems, in part because very early in its development software tools (based
on the simplex method) were made available to solve linear programming
problems. On the other hand, no such tools have been available for the much
more general method of dynamic programming, largely due to its very gen-
erality. One of the objectives of this book is to describe a software tool for
solving dynamic programming problems that is general, practical, and easy
to use, certainly relative to any of the other tools that have appeared from
time to time.

One reason that simplex-based tools for solving linear programming
problems have been successful is that, by the nature of linear programming,
problem speciﬁcation is relatively easy. A basic LP problem can be speciﬁed
essentially as a system or matrix of equations with a ﬁnite set of numeri-
cal variables as unknowns. That is, the input to an LP software tool can
*be provided in a tabular form, known as a tableaux. This also makes it easy*
to formulate LP problems as a spreadsheet. This led to spreadsheet system
providers to include in their product an LP solver, as is the case with Excel.

A software tool for solving dynamic programming problems is much more diﬃcult to design, in part because the problem speciﬁcation task in itself

*A. Lew and H. Mauch: Introduction to Dynamic Programming, Studies in Computational Intel-*
**ligence (SCI) 38, 3–43 (2007)**

www.springerlink.com * Springer-Verlag Berlin Heidelberg 2007*c

presents diﬃculties. A DP problem speciﬁcation is usually in the form of
*a complex (nonlinear) recursive equation, called the dynamic programming*
*functional equation (DPFE), where the DPFE often involves nonnumerical*
variables that may include sets or strings. Thus, the input to a DP tool must
necessarily be general enough to allow for complex DPFEs, at the expense
therefore of the simplicity of a simple table. The DP tool described in this
book assumes that the input DPFE is provided in a text-based speciﬁcation
language that does not rely on mathematical symbols. This decision conforms
to that made for other mathematical programming languages, such as AMPL
and LINGO.

In this introductory chapter, we ﬁrst discuss the basic principles underly- ing the use of dynamic programming to solve discrete optimization problems.

The key task is to formulate the problem in terms of an equation, the DPFE, such that the solution of the DPFE is the solution of the given optimization problem. We then illustrate the computational solution of the DPFE for a spe- ciﬁc problem (for linear search), either by use of a computer program written in a conventional programming language, or by use of a spreadsheet system.

It is not easy to generalize these examples to solve DP problems that do not resemble linear search. Thus, for numerous dissimilar DP problems, a signif- icant amount of additional eﬀort is required to obtain their computational solutions. One of the purposes of this book is to reduce this eﬀort.

In Chap. 2, we show by example numerous types of optimization problems that can be solved using DP. These examples are given, ﬁrst to demonstrate the general utility of DP as a problem solving methodology. Other books are more specialized in the kinds of applications discussed, often focusing on applications of interest mainly to operations research or to computer science.

Our coverage is much more comprehensive. Another important reason for providing numerous examples is that it is often diﬃcult for new students of the ﬁeld to see from a relatively small sample of problems how DP can be applied to other problems. How to apply DP to new problems is often learned by example; the more examples learned, the easier it is to generalize. Each of the sample problems presented in Chap. 2 was computationally solved using our DP tool. This demonstrates the generality, ﬂexibility, and practicality of the tool.

In Part II of this book, we show how each of the DPFEs given in Chap. 2 can be expressed in a text-based speciﬁcation language, and then show how these DPFEs can be formally modeled by a class of Petri nets, called Bellman nets. Bellman nets serve as the theoretical underpinnings for the DP tool we later describe, and we describe our research into this subject area.

In Part III of this book, we describe the design and implementation of our DP tool. This tool inputs DPFEs, as given in Part II, and produces numerical solutions, as given in Part IV.

In Part IV of this book, we present computational results. Speciﬁcally, we give the numerical solutions to each of the problems discussed in Chap. 2, as provided by our DP tool.

Appendix A of this book provides program listings for key portions of our DP tool. Appendix B of this book is a User/Reference Manual for our DP tool.

This book serves several purposes.

1. It provides a practical introduction to how to solve problems using DP.

From the numerous and varied examples we present in Chap. 2, we expect readers to more easily be able to solve new problems by DP. Many other books provide far fewer or less diverse examples, hoping that readers can generalize from their small sample. The larger sample provided here should assist the reader in this process.

2. It provides a software tool that can be and has been used to solve all of the Chap. 2 problems. This tool can be used by readers in practice, certainly to solve academic problems if this book is used in coursework, and to solve many real-world problems, especially those of limited size (where the state space is not excessive).

3. This book is also a research monograph that describes an important ap- plication of Petri net theory. More research into Petri nets may well result in improvements in our tool.

**1.1 Principles of Dynamic Programming**

*Dynamic programming is a method that in general solves optimization prob-*
lems that involve making a sequence of decisions by determining, for each
decision, subproblems that can be solved in like fashion, such that an optimal
solution of the original problem can be found from optimal solutions of sub-
problems. This method is based on Bellman’s Principle of Optimality, which
he phrased as follows [1, p.83].

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the ﬁrst decision.

More succinctly, this principle asserts that “optimal policies have optimal subpolicies.” That the principle is valid follows from the observation that, if a policy has a subpolicy that is not optimal, then replacement of the subpolicy by an optimal subpolicy would improve the original policy. The principle of optimality is also known as the “optimal substructure” property in the literature. In this book, we are primarily concerned with the computational solution of problems for which the principle of optimality is given to hold.

For DP to be computationally eﬃcient (especially relative to evaluating all possible sequences of decisions), there should be common subproblems such that subproblems of one are subproblems of another. In this event, a solution to a subproblem need only be found once and reused as often as necessary;

however, we do not incorporate this requirement as part of our deﬁnition of DP.

In this section, we will ﬁrst elaborate on the nature of sequential deci-
sion processes and on the importance of being able to separate the costs for
each of the individual decisions. This will lead to the development of a gen-
*eral equation, the dynamic programming functional equation (DPFE), that*
formalizes the principle of optimality. The methodology of dynamic program-
ming requires deriving a special case of this general DPFE for each speciﬁc
optimization problem we wish to solve. Numerous examples of such deriva-
tions will be presented in this book. We will then focus on how to numerically
solve DPFEs, and will later describe a software tool we have developed for
this purpose.

**1.1.1 Sequential Decision Processes**

For an optimization problem of the form opt_{d∈∆}*{H(d)}, d is called the de-*
*cision, which is chosen from a set of eligible decisions ∆, the optimand H*
*is called the objective function, and H*^{∗}*= H(d*^{∗}*) is called the optimum,*
*where d*^{∗}*is that value of d* *∈ ∆ for which H(d) has the optimal (min-*
*imum or maximum) value. We also say that d*^{∗}*optimizes H, and write*
*d** ^{∗}* = arg opt

_{d}*{H(d)}. Many optimization problems consist of ﬁnding a set*of decisions

*{d*1

*, d*

_{2}

*, . . . , d*

_{n}*}, that taken together yield the optimum H*

*of an*

^{∗}*objective function h(d*

_{1}

*, d*

_{2}

*, . . . , d*

_{n}*). Solution of such problems by enumera-*

*tion, i.e., by evaluating h(d*

_{1}

*, d*

_{2}

*, . . . , d*

*) concurrently, for all possible combina- tions of values of its decision arguments, is called the “brute force” approach;*

_{n}this approach is manifestly ineﬃcient. Rather than making decisions concur-
rently, we assume the decisions may be made in some speciﬁed sequence, say
*(d*1*, d*2*, . . . , d**n*), i.e., such that

*H** ^{∗}* = opt

_{(d}_{1}

_{,d}_{2}

_{,...,d}

_{n}_{)∈∆}

*{h(d*1

*, d*2

*, . . . , d*

*n*)}

= opt_{d}_{1}_{∈D}_{1}*{opt**d*2*∈D*2*{. . . {opt**d**n**∈D**n**{h(d*1*, d*_{2}*, . . . , d** _{n}*)

*}} . . .}}, (1.1)*

*in what are known as sequential decision processes, where the ordered set*

*(d*1

*, d*2

*, . . . , d*

*n*

*) belongs to some decision space ∆ = D*1

*× D*2

*× . . . × D*

*n*, for

*d*

*i*

*∈ D*

*i*

*. Examples of decision spaces include: ∆ = B*

*, the special case*

^{n}*of Boolean decisions, where each decision set D*

*i*

*equals B =*

*{0, 1}; and*

**∆ = Π(D), a permutation of a set of eligible decisions D. The latter illus-***trates the common situation where decisions d**i* are interrelated, e.g., where
*they satisfy constraints such as d**i* *= d**j* *or d**i**+ d**j* *≤ M. In general, each*
*decision set D**i* *depends on the decisions (d*1*, d*2*, . . . , d** _{i−1}*) that are earlier

*in the speciﬁed sequence, i.e., d*

*i*

*∈ D*

*i*

*(d*1

*, d*2

*, . . . , d*

*). Thus, to show this dependence explicitly, we rewrite (1.1) in the form*

_{i−1}*H** ^{∗}* = opt

_{(d}_{1}

_{,d}_{2}

_{,...,d}

_{n}_{)}

_{∈∆}*{h(d*1

*, d*

_{2}

*, . . . , d*

*)*

_{n}*}*

= opt_{d}_{1}_{∈D}_{1}*{opt*_{d}_{2}_{∈D}_{2}_{(d}_{1}_{)}*{. . . {opt*_{d}_{n}_{∈D}_{n}_{(d}_{1}_{,...,d}_{n−1}_{)}*{h(d*1*, . . . , d**n*)}} . . .}}.

(1.2)

This nested set of optimization operations is to be performed from inside-out
*(right-to-left), the innermost optimization yielding the optimal choice for d**n*

*as a function of the possible choices for d*1*, . . . , d**n**−1**, denoted d*^{∗}_{n}*(d*1*, . . . , d**n**−1*),
and the outermost optimization opt_{d1∈D1}*{h(d*1*, d*^{∗}_{2}*, . . . , d*^{∗}* _{n}*)} yielding the op-

*timal choice for d*1

*, denoted d*

^{∗}_{1}. Note that while the initial or “ﬁrst” decision

*d*1in the speciﬁed sequence is the outermost, the optimizations are performed inside-out, each depending upon outer decisions. Furthermore, while the op- timal solution may be the same for any sequencing of decisions, e.g.,

opt_{d}_{1}_{∈D}_{1}*{opt**d*2*∈D*2*(d*1)*{. . . {opt**d**n**∈D**n**(d*1*,...,d** _{n−1}*)

*{h(d*1

*, . . . , d*

*n*)}} . . .}}

= opt_{d}_{n}_{∈D}_{n}*{opt*_{d}_{n}_{−1}_{∈D}_{n}_{−1}_{(d}_{n}_{)}*{. . . {opt*_{d}_{1}_{∈D}_{1}_{(d}_{2}_{,...,d}_{n}_{)}*{h(d*1*, . . . , d**n*)}} . . .}}

(1.3)
*the decision sets D** _{i}*may diﬀer since they depend on diﬀerent outer decisions.

Thus, eﬃciency may depend upon the order in which decisions are made.

Referring to the foregoing equation, for a given sequencing of decisions,
if the outermost decision is “tentatively” made initially, whether or not it is
*optimal depends upon the ultimate choices d*^{∗}* _{i}* that are made for subsequent

*decisions d*

*i*; i.e.,

*H** ^{∗}* = opt

_{d}_{1}

_{∈D}_{1}

*{opt*

_{d}_{2}

_{∈D}_{2}

_{(d}_{1}

_{)}

*{. . . {opt*

_{d}

_{n}

_{∈D}

_{n}

_{(d}_{1}

_{,...,d}

_{n−1}_{)}

*{h(d*1

*, . . . , d*

*n*)}} . . .}}

= opt_{d1∈D1}*{h(d*1*, d*^{∗}_{2}*(d*_{1}*), . . . , d*^{∗}_{n}*(d*_{1}))*}* (1.4)
*where each of the choices d*^{∗}_{i}*(d*1*) for i = 2, . . . , n is constrained by — i.e., is a*
*function of — the choice for d*1*. Note that determining the optimal choice d*^{∗}_{1}=
arg opt_{d}_{1}_{∈D}_{1}*{h(d*1*, d*^{∗}_{2}*(d*1*), . . . , d*^{∗}_{n}*(d*1))} requires evaluating h for all possible
*choices of d*1unless there is some reason that certain choices can be excluded
from consideration based upon a priori (given or derivable) knowledge that
*they cannot be optimal. One such class of algorithms would choose d*1*∈ D*1

*independently of (but still constrain) the choices for d*2*, . . . , d**n*, i.e., by ﬁnding
the solution of a problem of the form opt_{d}_{1}_{∈D}_{1}*{H*^{}*(d*1)} for a function H* ^{}* of

*d*1

*that is myopic in the sense that it does not depend on other choices d*

*i*. Such an algorithm is optimal if the locally optimal solution of opt

_{d}_{1}

*{H*

^{}*(d*

_{1})

*}*

*yields the globally optimal solution H*

*.*

^{∗}*Suppose that the objective function h is (strongly) separable in the sense*
that

*h(d*_{1}*, . . . , d*_{n}*) = C*_{1}*(d*_{1})*◦ C*2*(d*_{2})*◦ . . . ◦ C**n**(d** _{n}*) (1.5)

*where the decision-cost functions C*

*represent the costs (or proﬁts) associated*

_{i}*with the individual decisions d*

*i*, and where

*◦ is an associative binary opera-*tion, usually addition or multiplication, where opt

_{d}*{a◦C(d)} = a◦opt*

_{d}*{C(d)}*

*for any a that does not depend upon d. In the context of sequential decision*
*processes, the cost C**n* *of making decision d**n* may be a function not only of
*the decision itself, but also of the state (d*1*, d*2*, . . . , d**n**−1*) in which the decision
is made. To emphasize this, we will rewrite (1.5) as

*h(d*_{1}*, . . . , d*_{n}*) = C*_{1}*(d*_{1}*|∅) ◦ C*2*(d*_{2}*|d*1)*◦ . . . ◦ C**n**(d*_{n}*|d*1*, . . . , d*_{n−1}*).* (1.6)
*We now deﬁne h as (weakly) separable if*

*h(d*1*, . . . , d**n**) = C*1*(d*1)*◦ C*2*(d*1*, d*2)*◦ . . . ◦ C**n**(d*1*, . . . , d**n**).* (1.7)
*(Strong separability is, of course, a special case of weak separability.) If h is*
(weakly) separable, we then have

opt_{d}_{1}_{∈D}_{1}*{opt**d*2*∈D*2*(d*1)*{. . . {opt**d**n**∈D**n**(d*1*,...,d** _{n−1}*)

*{h(d*1

*, . . . , d*

*n*)}} . . .}}

= opt_{d}_{1}_{∈D}_{1}*{opt*_{d}_{2}_{∈D}_{2}_{(d}_{1}_{)}*{. . . {opt*_{d}_{n}_{∈D}_{n}_{(d}_{1}_{,...,d}_{n}_{−1}_{)}*{C*1*(d*1*|∅) ◦ C*2*(d*2*|d*1)*◦ . . .*
*. . .◦ C**n**(d*_{n}*|d*1*, . . . , d** _{n−1}*)

*}} . . .}}*

= opt_{d}_{1}_{∈D}_{1}*{C*1*(d*1*|∅) ◦ opt*_{d}_{2}_{∈D}_{2}_{(d}_{1}_{)}*{C*2*(d*2*|d*1)*◦ . . .*

*. . .◦ opt**d**n**∈D**n**(d*1*,...,d** _{n−1}*)

*{C*

*n*

*(d*

_{n}*|d*1

*, . . . , d*

*)*

_{n−1}*} . . .}}.*(1.8)

*Let the function f (d*

_{1}

*, . . . , d*

*) be deﬁned as the optimal solution of the*

_{i−1}*sequential decision process where the decisions d*1

*, . . . , d*

*i*

*−1*have been made

*and the decisions d*

*i*

*, . . . , d*

*n*remain to be made; i.e.,

*f (d*1*, . . . , d**i**−1*) = opt_{d}_{i}*{opt*_{d}_{i+1}*{. . . {opt*_{d}_{n}*{C**i**(d**i**|d*1*, . . . , d**i**−1*)*◦*

*C*_{i+1}*(d*_{i+1}*|d*1*, . . . , d** _{i}*)

*◦ . . . ◦ C*

*n*

*(d*

_{n}*|d*1

*, . . . , d*

*)*

_{n−1}*}} . . .}}.*

(1.9)
*Explicit mentions of the decision sets D**i* are omitted here for convenience.

We have then

*f (∅) = opt**d*1*{opt**d*2*{. . . {opt**d**n**{C*1*(d*_{1}*|∅) ◦ C*2*(d*_{2}*|d*1)*◦ . . .*
*. . .◦ C**n**(d**n**|d*1*, . . . , d**n**−1*)}} . . .}}

= opt_{d}_{1}*{C*1*(d*_{1}*|∅) ◦ opt**d*2*{C*2*(d*_{2}*|d*1)*◦ . . .*
*. . .◦ opt**d**n**{C**n**(d**n**|d*1*, . . . , d** _{n−1}*)} . . .}}

= opt_{d}_{1}*{C*1*(d*1*|∅) ◦ f(d*1)}. (1.10)
Generalizing, we conclude that

*f (d*1*, . . . , d**i**−1*) = opt_{d}_{i}_{∈D}_{i}_{(d}_{1}_{,...,d}_{i−1}_{)}*{C**i**(d**i**|d*1*, . . . , d**i**−1*)*◦ f(d*1*, . . . , d**i*)}.

(1.11)
*Equation (1.11) is a recursive functional equation; we call it a functional*
*equation since the unknown in the equation is a function f , and it is recursive*
*since f is deﬁned in terms of f (but having diﬀerent arguments). It is the*
*dynamic programming functional equation (DPFE) for the given optimization*
problem. In this book, we assume that we are given DPFEs that are properly
formulated, i.e., that their solutions exist; we address only issues of how to
obtain these solutions.

**1.1.2 Dynamic Programming Functional Equations**

*The problem of solving the DPFE for f (d*1*, . . . , d**i**−1*) depends upon the sub-
*problem of solving for f (d*1*, . . . , d**i**). If we deﬁne the state S = (d*1*, . . . , d**i**−1*) as
*the sequence of the ﬁrst i−1 decisions, where i = |S|+1 = |{d*1*, . . . , d**i**−1**}|+1,*
we may rewrite the DPFE in the form

*f (S) = opt*_{d}_{i}_{∈D}_{i}_{(S)}*{C**i**(d**i**|S) ◦ f(S** ^{}*)}, (1.12)

*where S is a state in a set*

*S of possible states, S*

^{}*= (d*1

*, . . . , d*

*i*) is a next- state, and

*∅ is the initial state. Since the DPFE is recursive, to terminate the*

*recursion, its solution requires base cases (or “boundary” conditions), such as*

*f (S*

_{0}

*) = b when S*

_{0}

*∈ S*

*base*, whereS

*base*

*⊂ S. For a base (or terminal) state*

*S*

_{0}

*, f (S*

_{0}) is not evaluated using the DPFE, but instead has a given numerical

*constant b as its value; this value b may depend upon the base state S*

_{0}.

It should be noted that the sequence of decisions need not be limited to
*a ﬁxed length n, but may be of indeﬁnite length, terminating when a base*
case is reached. Diﬀerent classes of DP problems may be characterized by how
*the states S, and hence the next-states S** ^{}*, are deﬁned. It is often convenient

*to deﬁne the state S, not as the sequence of decisions made so far, with the*

*next decision d chosen from D(S), but rather as the set from which the next*

*decision can be chosen, so that D(S) = or d∈ S. We then have a DPFE of*the form

*f (S) = opt*_{d}_{∈S}*{C(d|S) ◦ f(S** ^{}*)}. (1.13)
We shall later show that, for some problems, there may be multiple next-
states, so that the DPFE has the form

*f (S) = opt*_{d}_{∈S}*{C(d|S) ◦ f(S** ^{}*)

*◦ f(S*

*)*

^{}*}*(1.14)

*where S*

^{}*and S*

^{}*are both next-states. A DPFE is said to be r-th order (or*

*nonserial if r > 1) if there may be r next-states.*

Simple serial DP formulations can be modeled by a state transition system
*or directed graph, where a state S corresponds to a node (or vertex) and a*
*decision d that leads from state S to next-state S** ^{}* is represented by a branch

*(or arc or edge) with label C(d*

*i*

*|S). D(S) is the set of possible decisions when*

*in state S, hence is associated with the successors of node S. More complex*DP formulations require a more general graph model, such as that of a Petri net, which we discuss in Chap. 5.

Consider the directed graph whose nodes represent the states of the DPFE
and whose branches represent possible transitions from states to next-states,
*each such transition reﬂecting a decision. The label of each branch, from S to*
*S*^{}*, denoted b(S, S*^{}*), is the cost C(d|S) of the decision d, where S*^{}*= T (S, d),*
*where T :* *S × D → S is a next-state transition or transformation function.*

The DPFE can then be rewritten in the form

*f (S) = opt*_{S}*{b(S, S*^{}*) + f (S** ^{}*)}, (1.15)

*where f (S) is the length of the shortest path from S to a terminal or*

*target state S*0

*, and where each decision is to choose S*

*from among all*

^{}*(eligible) successors of S. (Diﬀerent problems may have diﬀerent eligibility*

*constraints.) The base case is f (S*0) = 0.

For some problems, it is more convenient to use a DPFE of the “reverse”

form

*f*^{}*(S) = opt*_{S}*{f*^{}*(S*^{}*) + b(S*^{}*, S)},* (1.16)
*where f*^{}*(S) is the length of the shortest path from a designated state S*_{0}
*to S, and S*^{}*is a predecessor of S; S*_{0} *is also known as the source state,*
*and f (S*_{0}) = 0 serves as the base case that terminates the recursion for this
*alternative DPFE. We call these target-state and designated-source DPFEs,*
respectively. We also say that, in the former case, we go “backward” from
the target to the source, whereas, in the latter case, we go forward from the

“source” to the target.

Diﬀerent classes of DP formulations are distinguished by the nature of the
decisions. Suppose each decision is a number chosen from a set*{1, 2, . . . , N},*
*and that each number must be chosen once and only once (so there are N*
decisions). Then if states correspond to possible permutations of the numbers,
*there are O(N !) such states. Here we use the “big-O” notation ([10, 53]): we*
*say f (N ) is O(g(N )) if, for a suﬃciently large N , f (N ) is bounded by a*
*constant multiple of g(N ). As another example, suppose each decision is a*
number chosen from a set *{1, 2, . . . , N}, but that not all numbers must be*
*chosen (so there may be less than N decisions). Then if states correspond to*
*subsets of the numbers, there are O(2** ^{N}*) such states. Fortuitously, there are
many practical problems where a reduction in the number of relevant states is

*possible, such as when only the ﬁnal decision d*

_{i−1}*in a sequence (d*

_{1}

*, . . . , d*

*),*

_{i−1}*together with the time or stage i at which the decision is made, is signiﬁcant, so*

*that there are O(N*

^{2}) such states. We give numerous examples of the diﬀerent classes in Chap. 2.

The solution of a DP problem generally involves more than only computing
*the value of f (S) for the goal state S** ^{∗}*. We may also wish to determine the
initial optimal decision, the optimal second decision that should be made in
the next-state that results from the ﬁrst decision, and so forth; that is, we may
wish to determine the optimal sequence of decisions, also known as the optimal

“policy” , by what is known as a reconstruction process. To reconstruct these
*optimal decisions, when evaluating f (S) = opt*_{d}_{∈D(S)}*{C(d|S)◦f(S** ^{}*)} we may

*save the value of d, denoted d*

^{∗}*, that yields the optimal value of f (S) at the*

*time we compute this value, say, tabularly by entering the value d*

^{∗}*(S) in*

*a table for each S. The main alternative to using such a policy table is to*

*reevaluate f (S) as needed, as the sequence of next-states are determined; this*is an example of a space versus time tradeoﬀ.

**1.1.3 The Elements of Dynamic Programming**

The basic form of a dynamic programming functional equation is

*f (S) = opt*_{d}_{∈D(S)}*{R(S, d) ◦ f(T (S, d))},* (1.17)
*where S is a state in some state space* *S, d is a decision chosen from a decision*
*space D(S), R(S, d) is a reward function (or decision cost, denoted C(d|S)*
*above), T (S, d) is a next-state transformation (or transition) function, and*

*◦ is a binary operator. We will restrict ourselves to discrete DP, where the*
state space and decision space are both discrete sets. (Some problems with
continuous states or decisions can be handled by discretization procedures, but
we will not consider such problems in this book.) The elements of a DPFE
have the following characteristics.

* State The state S, in general, incorporates information about the sequence*
of decisions made so far. In some cases, the state may be the complete
sequence, but in other cases only partial information is suﬃcient; for ex-
ample, if the set of all states can be partitioned into equivalence classes,
each represented by the last decision. In some simpler problems, the length
of the sequence, also called the stage at which the next decision is to be
made, suﬃces. The initial state, which reﬂects the situation in which no

*decision has yet been made, will be called the goal state and denoted S*

*.*

^{∗}

**Decision Space The decision space D(S) is the set of possible or “eligible”***choices for the next decision d. It is a function of the state S in which*
*the decision d is to be made. Constraints on possible next-state transfor-*
*mations from a state S can be imposed by suitably restricting D(S). If*
*D(S) =∅ , so that there are no eligible decisions in state S, then S is a*
terminal state.

* Objective Function The objective function f , a function of S, is the op-*
timal proﬁt or cost resulting from making a sequence of decisions when

*in state S, i.e., after making the sequence of decisions associated with S.*

*The goal of a DP problem is to ﬁnd f (S) for the goal state S** ^{∗}*.

**Reward Function The reward function R, a function of S and d, is the***proﬁt or cost that can be attributed to the next decision d made in state*
*S. The reward R(S, d) must be separable from the proﬁts or costs that are*
attributed to all other decisions. The value of the objective function for
*the goal state, f (S** ^{∗}*), is the combination of the rewards for the complete
optimal sequence of decisions starting from the goal state.

**Transformation Function(s) The transformation (or transition) function**
*T , a function of S and d, speciﬁes the next-state that results from making*
*a decision d in state S. As we shall later see, for nonserial DP problems,*
there may be more than one transformation function.

**Operator The operator is a binary operation, usually addition or multiplica-**
tion or minimization/maximization, that allows us to combine the returns
of separate decisions. This operation must be associative if the returns of
decisions are to be independent of the order in which they are made.

**Base Condition Since the DPFE is recursive, base conditions must be spec-**
*iﬁed to terminate the recursion. Thus, the DPFE applies for S in a state*
spaceS, but

*f (S*_{0}*) = b,*

*for S*_{0} in a set of base-states not in *S. Base-values b are frequently zero*
or inﬁnity, the latter to reﬂect constraints. For some problems, setting
*f (S*_{0}) =*±∞ is equivalent to imposing a constraint on decisions so as to*
*disallow transitions to state S*0*, or to indicate that S*0 *∈ S is a state in*
which no decision is eligible.

To solve a problem using DP, we must deﬁne the foregoing elements to reﬂect the nature of the problem at hand. We give several examples below.

We note ﬁrst that some problems require certain generalizations. For example, some problems require a second-order DPFE having the form

*f (S) = opt*_{d}_{∈D(S)}*{R(S, d) ◦ f(T*1*(S, d))◦ f(T*2*(S, d))},* (1.18)
*where T*1*and T*2are both transformation functions to account for the situation
in which more than one next-state can be entered, or

*f (S) = opt*_{d}_{∈D(S)}*{R(S, d) ◦ p*1*.f (T*1*(S, d))◦ p*2*.f (T*2*(S, d))}, (1.19)*
*where T*1 *and T*2 *are both transformation functions and p*1 *and p*2 are multi-
plicative weights. In probabilistic DP problems, these weights are probabilities
that reﬂect the probabilities associated with their respective state-transitions,
only one of which can actually occur. In deterministic DP problems, these
weights can serve other purposes, such as “discount factors” to reﬂect the
time value of money.

**1.1.4 Application: Linear Search**

To illustrate the key concepts associated with DP that will prove useful in
our later discussions, we examine a concrete example, the optimal “linear
search” problem. This is the problem of permuting the data elements of an
*array A of size N , whose element x has probability p** _{x}*, so as to optimize the
linear search process by minimizing the “cost” of a permutation, deﬁned as

*the expected number of comparisons required. For example, let A ={a, b, c}*

*and p**a* *= 0.2, p**b* *= 0.5, and p**c* *= 0.3. There are six permutations, namely,*
*abc, acb, bac, bca, cab, cba; the cost of the fourth permutation bca is 1.7, which*
can be calculated in several ways, such as

*1p**b**+ 2p**c**+ 3p**a* [using Method S]

and

*(p*_{a}*+ p*_{b}*+ p*_{c}*) + (p*_{a}*+ p*_{c}*) + (p*_{a}*) [using Method W].*