123 Dynamic Programming

(1)

(2)

123

Holger Mauch

Dynamic Programming

A Computational Tool

With 55 Figures and 5 Tables

(3)

ISSN electronic edition: 1860-9503

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of tranjj py gpy g gg

slation, reprinting, reuse of illustrations, recita-p tion, broadcasting, reproduction on microfilm or in any other way, and storage in data banks.

p y g

p y g p gg

Duplication of this publicationgg pp

or parts thereof is permitted onyy

ly under the provisions of theyy gg German Copyright Law of Septemp pp

ber 9, 1965, in its current version, and permission for use

p p

p y pp

must always be obtained from Springer-Verlag. Violations are liable to prosecution under the

py g p

py g pp

German Copyright Law. y

Springer is a part of Springer Science+Business Media springer.com

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from

g p g

g p g pp

the relevant protective laws and regulations and therefore free for general use. p yp y pp

5 4 3 2 1 0 Cover design: deblik, Berlin

ISSN print edition: 1860-949X

Typesetting by the authors and SPig

89/SPi Library of Congress Control Number: 2006930743

ISBN-10 3-540-37013-7 Springer Berlin Heidelberg New York ISBN-13 978-3-540-37013-0 Springer Berlin Heidelberg New York

Printeddd on acid-ffree paper SPIN: 11550860d

Computer Sciences Department of Information

University of Hawaii at Manoa 96822

Honolulu, HI USA

E-mail: [email protected]

Department of Computer Science Natural Sciences Collegium Eckerd College

USA

33711 Saint Petersburg, FL

E-mail: [email protected]

©

©Springer-Verlag Berlin Heidelberg 2007p gg gg and

4200, 54th Ave. S.

1680 East-West Road

(4)

To my family. H.M.

(5)

Dynamic programming has long been applied to numerous areas in mathematics, science, engineering, business, medicine, information systems, bio- mathematics, artiﬁcial intelligence, among others. Applications of dynamic programming have increased as recent advances have been made in areas such as neural networks, data mining, soft computing, and other areas of computational intelligence. The value of dynamic programming formulations and means to obtain their computational solutions has never been greater.

This book describes the use of dynamic programming as a computational tool to solve discrete optimization problems.

(1) We first formulate large classes of discrete optimization problems in dynamic programming terms, specifically by deriving the dynamic programming functional equations (DPFEs) that solve these problems. A text-based language, gDPS, for expressing these DPFEs is introduced. gDPS may be regarded as a high-level specification language, not a conventional procedural computer programming language, but which can be used to obtain numerical solutions.

(2) We then deﬁne and examine properties of Bellman nets, a class of Petri nets that serves both as a formal theoretical model of dynamic programming problems, and as an internal computer data structure representation of the DPFEs that solve these problems.

(3) We also describe the design, implementation, and use of a software tool, called DP2PN2Solver, for solving DPFEs. DP2PN2Solver may be regarded as a program generator, whose input is a DPFE, expressed in the input speciﬁ- cation language gDPS and internally represented as a Bellman net, and whose output is its numerical solution that is produced indirectly by the generation of “solver” code, which when executed yields the desired solution.

This book should be of value to different classes of readers: students, instructors, practitioners, and researchers. We first provide a tutorial introduction to dynamic programming and to Petri nets. For those interested in dynamic programming, we provide a useful software tool that allows them to obtain numerical solutions. For researchers having an interest in the fields of

(6)

dynamic programming and Petri nets, unlike most past work which applies dynamic programming to solve Petri net problems, we suggest ways to apply Petri nets to solve dynamic programming problems.

For students and instructors of courses in which dynamic programming is taught, usually as one of many other problem-solving methods, this book provides a wealth of examples that show how discrete optimization problems can be formulated in dynamic programming terms. Dynamic programming has been and continues to be taught as an “art”, where how to use it must be learned by example, there being no mechanical way to apply knowledge of the general principles (e.g., the principle of optimality) to new unfamiliar problems. Experience has shown that the greater the number and variety of problems presented, the easier it is for students to apply general concepts.

Thus, one objective of this book is to include many and more diverse examples.

A further distinguishing feature of this book is that, for all of these examples, we not only formulate the DP equations but also show their computational solutions, exhibiting computer programs (in our speciﬁcation language) as well as providing as output numerical answers (as produced by the automatically generated solver code).

In addition, we provide students and instructors with a software tool (DP2PN2Solver) that enables them to obtain numerical solutions of dynamic programming problems without requiring them to have much computer programming knowledge and experience. This software tool can be downloaded from either of the following websites:

http://natsci.eckerd.edu/∼mauchh/Research/DP2PN2Solver http://www2.hawaii.edu/∼icl/DP2PN2Solver

Further information is given in Appendix B. Having such software support allows them to focus on dynamic programming rather than on computer programming. Since many problems can be solved by diﬀerent dynamic programming formulations, the availability of such a computational tool, that makes it easier for readers to experiment with their own formulations, is a useful aid to learning.

The DP2PN2Solver tool also enables practitioners to obtain numerical solutions of dynamic programming problems of interest to them without requiring them to write conventional computer programs. Their time, of course, is better spent on problem formulation and analysis than on program design and debugging. This tool allows them to verify that their formulations are correct, and to revise them as may be necessary in their problem solving eﬀorts. The main limitation of this (and any) dynamic programming tool for many practical problems is the size of the state space. Even in this event, the tool may prove useful in the formulation stage to initially test ideas on simpliﬁed scaled-down problems.

As a program generator, DP2PN2Solver is ﬂexible, permitting alternate front-ends and back-ends. Inputs other than in the gDPS language are possible. Alternative DPFE speciﬁcations can be translated into gDPS or directly

(7)

into Bellman nets. Output solver code (i.e., the program that numerically solves a given DPFE) may be in alternative languages. The solver code emphasized in this book is Java code, largely because it is universally and freely available on practically every platform. We also discuss solver codes for spreadsheet systems and Petri net simulators. By default, the automatically generated solver code is hidden from the average user, but it can be inspected and modiﬁed directly by users if they wish.

Furthermore, this book describes research into connections between dynamic programming and Petri nets. It was our early research into such connections that ultimately lead to the concept of Bellman nets, upon which the development of our DP2PN2Solver tool is based. We explain here the underlying ideas associated with Bellman nets. Researchers interested in dynamic programming or Petri nets will ﬁnd many open questions related to this work that suggest avenues of future research. For example, additional research might very likely result in improvements in the DP2PN2Solver tool, such as to address the state-space size issue or to increase its diagnostic capabilities.

Every other aspect of this work may beneﬁt from additional research.

Thus, we expect the DP2PN2Solver tool described in this book to un- dergo revisions from time to time. In fact, the tool was designed modularly to make it relatively easy to modify. As one example, changes to the gDPS speciﬁcation language syntax can be made by simply revising its BNF deﬁ- nition since we use a compiler-compiler rather than a compiler to process it.

Furthermore, alternate input languages (other than gDPS) and solver codes (other than Java) can be added as optional modules, without changing the existing modules. We welcome suggestions from readers on how the tool (or its description) can be improved. We may be contacted at [email protected] or [email protected]. Updates to the software and to this book, including errata, will be placed on the aforementioned websites.

Acknowledgements. The authors wish to thank Janusz Kacprzyk for including this monograph in his ﬁne series of books. His encouragement has been very much appreciated.

Honolulu, June 2006, Art Lew

St. Petersburg, June 2006, Holger Mauch

(8)

Part I Dynamic Programming

1 Introduction to Dynamic Programming . . . . 3

1.1 Principles of Dynamic Programming . . . 5

1.1.1 Sequential Decision Processes . . . 6

1.1.2 Dynamic Programming Functional Equations . . . 9

1.1.3 The Elements of Dynamic Programming . . . 11

1.1.4 Application: Linear Search . . . 12

1.1.5 Problem Formulation and Solution . . . 14

1.1.6 State Transition Graph Model . . . 17

1.1.7 Staged Decisions . . . 19

1.1.8 Path-States . . . 21

1.1.9 Relaxation . . . 22

1.1.10 Shortest Path Problems . . . 23

1.1.11 All-Pairs Shortest Paths . . . 29

1.1.12 State Space Generation . . . 30

1.1.13 Complexity . . . 31

1.1.14 Greedy Algorithms . . . 32

1.1.15 Probabilistic DP . . . 32

1.1.16 Nonoptimization Problems . . . 33

1.1.17 Concluding Remarks . . . 34

1.2 Computational Solution of DPFEs . . . 34

1.2.1 Solution by Conventional Programming . . . 35

1.2.2 The State-Decision-Reward-Transformation Table . . . . 36

1.2.3 Code Generation . . . 38

1.2.4 Spreadsheet Solutions . . . 38

1.2.5 Example: SPA . . . 40

1.2.6 Concluding Remarks . . . 42

1.3 Overview of Book . . . 42

(9)

2 Applications of Dynamic Programming . . . 45

2.1 Optimal Allotment Problem (ALLOT) . . . 49

2.2 All-Pairs Shortest Paths Problem (APSP) . . . 50

2.3 Optimal Alphabetic Radix-Code Tree Problem (ARC) . . . 51

2.4 Assembly Line Balancing (ASMBAL) . . . 52

2.5 Optimal Assignment Problem (ASSIGN) . . . 54

2.6 Optimal Binary Search Tree Problem (BST) . . . 55

2.7 Optimal Covering Problem (COV) . . . 57

2.8 Deadline Scheduling Problem (DEADLINE) . . . 57

2.9 Discounted Proﬁts Problem (DPP) . . . 58

2.10 Edit Distance Problem (EDP) . . . 59

2.11 Fibonacci Recurrence Relation (FIB) . . . 60

2.12 Flowshop Problem (FLOWSHOP) . . . 61

2.13 Tower of Hanoi Problem (HANOI) . . . 62

2.14 Integer Linear Programming (ILP) . . . 63

2.15 Integer Knapsack as ILP Problem (ILPKNAP) . . . 64

2.16 Interval Scheduling Problem (INTVL) . . . 64

2.17 Inventory Problem (INVENT) . . . 66

2.18 Optimal Investment Problem (INVEST) . . . 67

2.19 Investment: Winning in Las Vegas Problem (INVESTWLV) . . 68

2.20 0/1 Knapsack Problem (KS01) . . . 69

2.21 COV as KSINT Problem (KSCOV) . . . 70

2.22 Integer Knapsack Problem (KSINT) . . . 70

2.23 Longest Common Subsequence (LCS) . . . 71

2.24 Optimal Linear Search Problem (LINSRC) . . . 73

2.25 Lot Size Problem (LOT) . . . 73

2.26 Longest Simple Path Problem (LSP) . . . 74

2.27 Matrix Chain Multiplication Problem (MCM) . . . 75

2.28 Minimum Maximum Problem (MINMAX) . . . 75

2.29 Minimum Weight Spanning Tree Problem (MWST) . . . 77

2.30 The Game of NIM (NIM) . . . 78

2.31 Optimal Distribution Problem (ODP) . . . 80

2.32 Optimal Permutation Problem (PERM) . . . 81

2.33 Jug-Pouring Problem (POUR) . . . 82

2.34 Optimal Production Problem (PROD) . . . 83

2.35 Production: Reject Allowances Problem (PRODRAP) . . . 84

2.36 Reliability Design Problem (RDP) . . . 84

2.37 Replacement Problem (REPLACE) . . . 85

2.38 Stagecoach Problem (SCP) . . . 86

2.39 Seek Disk Scheduling Problem (SEEK) . . . 87

2.40 Segmented Curve Fitting Problem (SEGLINE) . . . 88

2.41 Program Segmentation Problem (SEGPAGE) . . . 91

2.42 Optimal Selection Problem (SELECT) . . . 94

2.43 Shortest Path in an Acyclic Graph (SPA) . . . 95

2.44 Shortest Path in an Cyclic Graph (SPC) . . . 95

(10)

2.45 Process Scheduling Problem (SPT) . . . 97

2.46 Transportation Problem (TRANSPO) . . . 98

2.47 Traveling Salesman Problem (TSP) . . . 99

Part II Modeling of DP Problems 3 The DP Speciﬁcation Language gDPS . . . 103

3.1 Introduction to gDPS . . . 103

3.2 Design Principles of gDPS . . . 105

3.3 Detailed Description of the gDPS Sections . . . 106

3.3.1 Name Section . . . 106

3.3.2 General Variables Section . . . 106

3.3.3 Set Variables Section . . . 108

3.3.4 General Functions Section . . . 109

3.3.5 State Type Section . . . 110

3.3.6 Decision Variable Section . . . 110

3.3.7 Decision Space Section . . . 111

3.3.8 Goal Section . . . 111

3.3.9 DPFE Base Section . . . 112

3.3.10 DPFE Section . . . 113

3.3.11 Cost/Reward Function Section . . . 115

3.3.12 Transformation Function Section . . . 115

3.3.13 Transition Weight Section . . . 116

3.4 BNF Grammar of the gDPS language . . . 117

4 DP Problem Speciﬁcations in gDPS . . . 125

4.1 gDPS source for ALLOT . . . 125

4.2 gDPS source for APSP . . . 128

4.3 gDPS source for ARC . . . 131

4.4 gDPS source for ASMBAL . . . 132

4.5 gDPS source for ASSIGN . . . 135

4.6 gDPS source for BST . . . 136

4.7 gDPS source for COV . . . 138

4.8 gDPS source for DEADLINE . . . 139

4.9 gDPS source for DPP . . . 140

4.10 gDPS source for EDP . . . 141

4.11 gDPS source for FIB . . . 144

4.12 gDPS source for FLOWSHOP . . . 144

4.13 gDPS source for HANOI . . . 145

4.14 gDPS source for ILP . . . 146

4.15 gDPS source for ILPKNAP . . . 148

4.16 gDPS source for INTVL . . . 150

4.17 gDPS source for INVENT . . . 154

4.18 gDPS source for INVEST . . . 156

(11)

4.19 gDPS source for INVESTWLV . . . 157

4.20 gDPS source for KS01 . . . 158

4.21 gDPS source for KSCOV . . . 159

4.22 gDPS source for KSINT . . . 160

4.23 gDPS source for LCS . . . 161

4.24 gDPS source for LINSRC . . . 165

4.25 gDPS source for LOT . . . 167

4.26 gDPS source for LSP . . . 168

4.27 gDPS source for MCM . . . 170

4.28 gDPS source for MINMAX . . . 171

4.29 gDPS source for MWST . . . 173

4.30 gDPS source for NIM . . . 176

4.31 gDPS source for ODP . . . 176

4.32 gDPS source for PERM . . . 178

4.33 gDPS source for POUR . . . 179

4.34 gDPS source for PROD . . . 181

4.35 gDPS source for PRODRAP . . . 182

4.36 gDPS source for RDP . . . 184

4.37 gDPS source for REPLACE . . . 186

4.38 gDPS source for SCP . . . 187

4.39 gDPS source for SEEK . . . 189

4.40 gDPS source for SEGLINE . . . 190

4.41 gDPS source for SEGPAGE . . . 192

4.42 gDPS source for SELECT . . . 193

4.43 gDPS source for SPA . . . 194

4.44 gDPS source for SPC . . . 196

4.45 gDPS source for SPT . . . 199

4.46 gDPS source for TRANSPO . . . 200

4.47 gDPS source for TSP . . . 201

5 Bellman Nets: A Class of Petri Nets . . . 205

5.1 Petri Net Introduction . . . 205

5.1.1 Place/Transition Nets . . . 205

5.1.2 High-level Petri Nets . . . 207

5.1.3 Colored Petri Nets . . . 208

5.1.4 Petri Net Properties . . . 209

5.1.5 Petri Net Software . . . 210

5.2 Petri Net Models of Dynamic Programming . . . 210

5.3 The Low-Level Bellman Net Model . . . 212

5.3.1 Construction of the Low-Level Bellman Net Model . . . 212

5.3.2 The Role of Transitions in the Low-Level Bellman Net Model . . . 213

5.3.3 The Role of Places in the Low-Level Bellman Net Model . . . 213

(12)

5.3.4 The Role of Markings in the Low-Level Bellman Net

Model . . . 214

5.3.5 Advantages of the Low-Level Bellman Net Model . . . . 214

5.4 Low-Level Bellman Net Properties . . . 214

5.5 The High-Level Bellman Net Model . . . 215

5.6 High-Level Bellman Net Properties . . . 219

6 Bellman Net Representations of DP Problems . . . 221

6.1 Graphical Representation of Low-Level Bellman Net Examples222 6.1.1 Low-Level Bellman Net for BST . . . 222

6.1.2 Low-Level Bellman Net for LINSRC . . . 222

6.1.3 Low-Level Bellman Net for MCM . . . 224

6.1.4 Low-Level Bellman Net for ODP . . . 224

6.1.5 Low-Level Bellman Net for PERM . . . 227

6.1.6 Low-Level Bellman Net for SPA . . . 228

6.2 Graphical Representation of High-Level Bellman Net Examples228 6.2.1 High-Level Bellman Net for EDP . . . 230

6.2.2 High-Level Bellman Net for ILP . . . 230

6.2.3 High-Level Bellman Net for KS01 . . . 231

6.2.4 High-Level Bellman Net for LCS . . . 231

6.2.5 High-Level Bellman Net for LINSRC . . . 234

6.2.6 High-Level Bellman Net for LSP . . . 235

6.2.7 High-Level Bellman Net for MCM . . . 236

6.2.8 High-Level Bellman Net for RDP . . . 238

6.2.9 High-Level Bellman Net for SCP . . . 238

6.2.10 High-Level Bellman Net for SPA . . . 240

6.2.11 High-Level Bellman Net for SPC . . . 242

Part III Design and Implementation of DP Tool 7 DP2PN2Solver Tool . . . 247

7.1 Overview . . . 247

7.2 Internal Representation of Bellman Nets . . . 251

7.3 Compiling and Executing DP Programs . . . 252

7.4 The ILP2gDPS Preprocessor Module . . . 255

8 DP2PN Parser and Builder . . . 259

8.1 Design of the DP2PN modules . . . 259

8.2 Implementation of the DP2PN modules . . . 260

8.3 The Module LINSRCSMain . . . 263

8.4 Error Detection in DP2PN . . . 268

(13)

9 The PN2Solver Modules . . . 271

9.1 The Solver Code Generation Process . . . 271

9.2 The PN2Java Module . . . 273

9.2.1 Java Solver Code Calculation Objects . . . 274

9.2.2 Java Solver Code for LINSRCS . . . 276

9.2.3 Java Solver Code for LSP . . . 278

9.2.4 Java Solver Code for MCM . . . 278

9.2.5 Java Solver Code for SPA . . . 280

9.3 The PN2Spreadsheet Module . . . 281

9.3.1 PN2Spreadsheet Solver Code for LINSRCS . . . 282

9.3.2 PN2Spreadsheet Solver Code for Other Examples . . . . 284

9.4 The PN2XML Module . . . 284

9.4.1 Petri Net Solver Code for LINSRCS . . . 285

9.4.2 Petri Net Solver Code for SPA . . . 288

9.5 Conclusion . . . 289

Part IV Computational Results 10 Java Solver Results of DP Problems . . . 293

10.1 ALLOT Java Solver Output . . . 293

10.2 APSP Java Solver Output . . . 294

10.3 ARC Java Solver Output . . . 296

10.4 ASMBAL Java Solver Output . . . 296

10.5 ASSIGN Java Solver Output . . . 297

10.6 BST Java Solver Output . . . 297

10.7 COV Java Solver Output . . . 298

10.8 DEADLINE Java Solver Output . . . 298

10.9 DPP Java Solver Output . . . 299

10.10 EDP Java Solver Output . . . 299

10.11 FIB Java Solver Output . . . 299

10.12 FLOWSHOP Java Solver Output . . . 300

10.13 HANOI Java Solver Output . . . 300

10.14 ILP Java Solver Output . . . 301

10.15 ILPKNAP Java Solver Output . . . 301

10.16 INTVL Java Solver Output . . . 302

10.17 INVENT Java Solver Output . . . 303

10.18 INVEST Java Solver Output . . . 304

10.19 INVESTWLV Java Solver Output . . . 304

10.20 KS01 Java Solver Output . . . 305

10.21 KSCOV Java Solver Output . . . 306

10.22 KSINT Java Solver Output . . . 306

10.23 LCS Java Solver Output . . . 306

10.24 LINSRC Java Solver Output . . . 307

10.25 LOT Java Solver Output . . . 308

(14)

10.26 LSP Java Solver Output . . . 308

10.27 MCM Java Solver Output . . . 308

10.28 MINMAX Java Solver Output . . . 309

10.29 MWST Java Solver Output . . . 309

10.30 NIM Java Solver Output . . . 309

10.31 ODP Java Solver Output . . . 312

10.32 PERM Java Solver Output . . . 312

10.33 POUR Java Solver Output . . . 312

10.34 PROD Java Solver Output . . . 313

10.35 PRODRAP Java Solver Output . . . 314

10.36 RDP Java Solver Output . . . 314

10.37 REPLACE Java Solver Output . . . 315

10.38 SCP Java Solver Output . . . 315

10.39 SEEK Java Solver Output . . . 315

10.40 SEGLINE Java Solver Output . . . 316

10.41 SEGPAGE Java Solver Output . . . 316

10.42 SELECT Java Solver Output . . . 317

10.43 SPA Java Solver Output . . . 317

10.44 SPC Java Solver Output . . . 318

10.45 SPT Java Solver Output . . . 318

10.46 TRANSPO Java Solver Output . . . 319

10.47 TSP Java Solver Output . . . 319

11 Other Solver Results . . . 321

11.1 PN2Spreadsheet Solver Code Output . . . 321

11.1.1 PN2Spreadsheet Solver Code for LINSRCS . . . 321

11.1.2 PN2Spreadsheet Solver Code for LSP . . . 322

11.1.3 PN2Spreadsheet Solver Code for MCM . . . 322

11.1.4 PN2Spreadsheet Solver Code for SPA . . . 323

11.1.5 Spreadsheet Output . . . 323

11.2 PN2XML Solver Code Output . . . 324

11.2.1 PN2XML Simulation Output for LINSRCS . . . 325

12 Conclusions . . . 329

12.1 Applicability of DP and DP2PN2Solver . . . 329

12.2 The DP2PN2Solver Tool . . . 330

12.3 Research Directions . . . 332

12.3.1 User Functionality . . . 333

12.3.2 Reduction of Dimensionality . . . 334

12.3.3 Petri Net Modeling . . . 335

12.4 Summary . . . 336

(15)

A Supplementary Material . . . 339

A.1 Pseudocode of the DP2PN Module . . . 339

A.1.1 Main Class for LINSRCS . . . 339

A.1.2 State Class for LINSRCS . . . 342

A.1.3 Decision Class . . . 343

A.1.4 DPInstanceTableEntry Class . . . 344

A.1.5 DPInstance Class . . . 344

A.1.6 BellmanNet Class . . . 349

A.2 DP2PN System Files . . . 353

A.3 Output from PN2XML . . . 356

A.3.1 High-Level Bellman Net XML ﬁle for SPA1 . . . 356

B User Guide for DP2PN2Solver . . . 359

B.1 System Requirements for DP2PN2Solver . . . 359

B.1.1 Java Environment . . . 359

B.2 Obtaining DP2PN2Solver . . . 360

B.3 Installation of DP2PN2Solver . . . 360

B.3.1 Deployment of the Files . . . 360

B.4 Running DP2PN2Solver . . . 361

B.4.1 The DP2PN Module . . . 361

B.4.2 The PN2Solver Module . . . 363

B.5 Creation of the gDPS Source File . . . 365

B.6 Debugging gDPS Code . . . 366

B.6.1 Omission of Base Cases . . . 366

B.6.2 Common Mistakes . . . 367

B.7 Error Messages of DP2PN2Solver . . . 368

References . . . 371

Index . . . 375

(16)

Introduction to Dynamic Programming

This book concerns the use of a method known as dynamic programming (DP) to solve large classes of optimization problems. We will focus on discrete optimization problems for which a set or sequence of decisions must be made to optimize (minimize or maximize) some function of the decisions. There are of course numerous methods to solve discrete optimization problems, many of which are collectively known as mathematical programming methods. Our objective here is not to compare these other mathematical programming methods with dynamic programming. Each has advantages and disadvantages, as discussed in many other places. However, we will note that the most prominent of these other methods is linear programming. As its name suggests, it has limitations associated with its linearity assumptions whereas many problems are nonlinear. Nevertheless, linear programming and its variants and exten- sions (some that allow nonlinearities) have been used to solve many real world problems, in part because very early in its development software tools (based on the simplex method) were made available to solve linear programming problems. On the other hand, no such tools have been available for the much more general method of dynamic programming, largely due to its very generality. One of the objectives of this book is to describe a software tool for solving dynamic programming problems that is general, practical, and easy to use, certainly relative to any of the other tools that have appeared from time to time.

One reason that simplex-based tools for solving linear programming problems have been successful is that, by the nature of linear programming, problem specification is relatively easy. A basic LP problem can be specified essentially as a system or matrix of equations with a finite set of numerical variables as unknowns. That is, the input to an LP software tool can be provided in a tabular form, known as a tableaux. This also makes it easy to formulate LP problems as a spreadsheet. This led to spreadsheet system providers to include in their product an LP solver, as is the case with Excel.

A software tool for solving dynamic programming problems is much more diﬃcult to design, in part because the problem speciﬁcation task in itself

A. Lew and H. Mauch: Introduction to Dynamic Programming, Studies in Computational Intel- ligence (SCI) 38, 3–43 (2007)

www.springerlink.com Springer-Verlag Berlin Heidelberg 2007c

(17)

presents difficulties. A DP problem specification is usually in the form of a complex (nonlinear) recursive equation, called the dynamic programming functional equation (DPFE), where the DPFE often involves nonnumerical variables that may include sets or strings. Thus, the input to a DP tool must necessarily be general enough to allow for complex DPFEs, at the expense therefore of the simplicity of a simple table. The DP tool described in this book assumes that the input DPFE is provided in a text-based specification language that does not rely on mathematical symbols. This decision conforms to that made for other mathematical programming languages, such as AMPL and LINGO.

In this introductory chapter, we ﬁrst discuss the basic principles underlying the use of dynamic programming to solve discrete optimization problems.

The key task is to formulate the problem in terms of an equation, the DPFE, such that the solution of the DPFE is the solution of the given optimization problem. We then illustrate the computational solution of the DPFE for a spe- ciﬁc problem (for linear search), either by use of a computer program written in a conventional programming language, or by use of a spreadsheet system.

It is not easy to generalize these examples to solve DP problems that do not resemble linear search. Thus, for numerous dissimilar DP problems, a significant amount of additional eﬀort is required to obtain their computational solutions. One of the purposes of this book is to reduce this eﬀort.

In Chap. 2, we show by example numerous types of optimization problems that can be solved using DP. These examples are given, ﬁrst to demonstrate the general utility of DP as a problem solving methodology. Other books are more specialized in the kinds of applications discussed, often focusing on applications of interest mainly to operations research or to computer science.

Our coverage is much more comprehensive. Another important reason for providing numerous examples is that it is often difficult for new students of the field to see from a relatively small sample of problems how DP can be applied to other problems. How to apply DP to new problems is often learned by example; the more examples learned, the easier it is to generalize. Each of the sample problems presented in Chap. 2 was computationally solved using our DP tool. This demonstrates the generality, flexibility, and practicality of the tool.

In Part II of this book, we show how each of the DPFEs given in Chap. 2 can be expressed in a text-based speciﬁcation language, and then show how these DPFEs can be formally modeled by a class of Petri nets, called Bellman nets. Bellman nets serve as the theoretical underpinnings for the DP tool we later describe, and we describe our research into this subject area.

In Part III of this book, we describe the design and implementation of our DP tool. This tool inputs DPFEs, as given in Part II, and produces numerical solutions, as given in Part IV.

In Part IV of this book, we present computational results. Speciﬁcally, we give the numerical solutions to each of the problems discussed in Chap. 2, as provided by our DP tool.

(18)

Appendix A of this book provides program listings for key portions of our DP tool. Appendix B of this book is a User/Reference Manual for our DP tool.

This book serves several purposes.

1. It provides a practical introduction to how to solve problems using DP.

From the numerous and varied examples we present in Chap. 2, we expect readers to more easily be able to solve new problems by DP. Many other books provide far fewer or less diverse examples, hoping that readers can generalize from their small sample. The larger sample provided here should assist the reader in this process.

2. It provides a software tool that can be and has been used to solve all of the Chap. 2 problems. This tool can be used by readers in practice, certainly to solve academic problems if this book is used in coursework, and to solve many real-world problems, especially those of limited size (where the state space is not excessive).

3. This book is also a research monograph that describes an important application of Petri net theory. More research into Petri nets may well result in improvements in our tool.

1.1 Principles of Dynamic Programming

Dynamic programming is a method that in general solves optimization prob- lems that involve making a sequence of decisions by determining, for each decision, subproblems that can be solved in like fashion, such that an optimal solution of the original problem can be found from optimal solutions of subproblems. This method is based on Bellman’s Principle of Optimality, which he phrased as follows [1, p.83].

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the ﬁrst decision.

More succinctly, this principle asserts that “optimal policies have optimal subpolicies.” That the principle is valid follows from the observation that, if a policy has a subpolicy that is not optimal, then replacement of the subpolicy by an optimal subpolicy would improve the original policy. The principle of optimality is also known as the “optimal substructure” property in the literature. In this book, we are primarily concerned with the computational solution of problems for which the principle of optimality is given to hold.

For DP to be computationally eﬃcient (especially relative to evaluating all possible sequences of decisions), there should be common subproblems such that subproblems of one are subproblems of another. In this event, a solution to a subproblem need only be found once and reused as often as necessary;

however, we do not incorporate this requirement as part of our deﬁnition of DP.

(19)

In this section, we will ﬁrst elaborate on the nature of sequential decision processes and on the importance of being able to separate the costs for each of the individual decisions. This will lead to the development of a gen- eral equation, the dynamic programming functional equation (DPFE), that formalizes the principle of optimality. The methodology of dynamic programming requires deriving a special case of this general DPFE for each speciﬁc optimization problem we wish to solve. Numerous examples of such deriva- tions will be presented in this book. We will then focus on how to numerically solve DPFEs, and will later describe a software tool we have developed for this purpose.

1.1.1 Sequential Decision Processes

For an optimization problem of the form opt_d∈∆{H(d)}, d is called the de- cision, which is chosen from a set of eligible decisions ∆, the optimand H is called the objective function, and H^∗ = H(d^∗) is called the optimum, where d^∗ is that value of d ∈ ∆ for which H(d) has the optimal (min- imum or maximum) value. We also say that d^∗ optimizes H, and write d^∗ = arg opt_d{H(d)}. Many optimization problems consist of ﬁnding a set of decisions{d1, d₂, . . . , d_n}, that taken together yield the optimum H^∗ of an objective function h(d₁, d₂, . . . , d_n). Solution of such problems by enumera- tion, i.e., by evaluating h(d₁, d₂, . . . , d_n) concurrently, for all possible combina- tions of values of its decision arguments, is called the “brute force” approach;

this approach is manifestly ineﬃcient. Rather than making decisions concurrently, we assume the decisions may be made in some speciﬁed sequence, say (d1, d2, . . . , dn), i.e., such that

H^∗ = opt_(d₁_,d₂_,...,d_n_)∈∆{h(d1, d2, . . . , dn)}

= opt_d₁_∈D₁{optd2∈D2{. . . {optdn∈Dn{h(d1, d₂, . . . , d_n)}} . . .}}, (1.1) in what are known as sequential decision processes, where the ordered set (d1, d2, . . . , dn) belongs to some decision space ∆ = D1× D2× . . . × Dn, for di ∈ Di. Examples of decision spaces include: ∆ = Bⁿ, the special case of Boolean decisions, where each decision set Di equals B = {0, 1}; and

∆ = Π(D), a permutation of a set of eligible decisions D. The latter illus- trates the common situation where decisions di are interrelated, e.g., where they satisfy constraints such as di = dj or di+ dj ≤ M. In general, each decision set Di depends on the decisions (d1, d2, . . . , d_i−1) that are earlier in the speciﬁed sequence, i.e., di ∈ Di(d1, d2, . . . , d_i−1). Thus, to show this dependence explicitly, we rewrite (1.1) in the form

H^∗ = opt_(d₁_,d₂_,...,d_n₎_∈∆{h(d1, d₂, . . . , d_n)}

= opt_d₁_∈D₁{opt_d₂_∈D₂_(d₁₎{. . . {opt_d_n_∈D_n_(d₁_,...,d_n−1₎{h(d1, . . . , dn)}} . . .}}.

(1.2)

(20)

This nested set of optimization operations is to be performed from inside-out (right-to-left), the innermost optimization yielding the optimal choice for dn

as a function of the possible choices for d1, . . . , dn−1, denoted d^∗_n(d1, . . . , dn−1), and the outermost optimization opt_d1∈D1{h(d1, d^∗₂, . . . , d^∗_n)} yielding the op- timal choice for d1, denoted d^∗₁. Note that while the initial or “ﬁrst” decision d1in the speciﬁed sequence is the outermost, the optimizations are performed inside-out, each depending upon outer decisions. Furthermore, while the optimal solution may be the same for any sequencing of decisions, e.g.,

opt_d₁_∈D₁{optd2∈D2(d1){. . . {optdn∈Dn(d1,...,d_n−1){h(d1, . . . , dn)}} . . .}}

= opt_d_n_∈D_n{opt_d_n₋₁_∈D_n₋₁_(d_n₎{. . . {opt_d₁_∈D₁_(d₂_,...,d_n₎{h(d1, . . . , dn)}} . . .}}

(1.3) the decision sets D_imay diﬀer since they depend on diﬀerent outer decisions.

Thus, eﬃciency may depend upon the order in which decisions are made.

Referring to the foregoing equation, for a given sequencing of decisions, if the outermost decision is “tentatively” made initially, whether or not it is optimal depends upon the ultimate choices d^∗_i that are made for subsequent decisions di; i.e.,

H^∗ = opt_d₁_∈D₁{opt_d₂_∈D₂_(d₁₎{. . . {opt_d_n_∈D_n_(d₁_,...,d_n−1₎{h(d1, . . . , dn)}} . . .}}

= opt_d1∈D1{h(d1, d^∗₂(d₁), . . . , d^∗_n(d₁))} (1.4) where each of the choices d^∗_i(d1) for i = 2, . . . , n is constrained by — i.e., is a function of — the choice for d1. Note that determining the optimal choice d^∗₁= arg opt_d₁_∈D₁{h(d1, d^∗₂(d1), . . . , d^∗_n(d1))} requires evaluating h for all possible choices of d1unless there is some reason that certain choices can be excluded from consideration based upon a priori (given or derivable) knowledge that they cannot be optimal. One such class of algorithms would choose d1∈ D1

independently of (but still constrain) the choices for d2, . . . , dn, i.e., by ﬁnding the solution of a problem of the form opt_d₁_∈D₁{H(d1)} for a function H of d1 that is myopic in the sense that it does not depend on other choices di. Such an algorithm is optimal if the locally optimal solution of opt_d₁{H(d₁)} yields the globally optimal solution H^∗.

Suppose that the objective function h is (strongly) separable in the sense that

h(d₁, . . . , d_n) = C₁(d₁)◦ C2(d₂)◦ . . . ◦ Cn(d_n) (1.5) where the decision-cost functions C_irepresent the costs (or proﬁts) associated with the individual decisions di, and where ◦ is an associative binary opera- tion, usually addition or multiplication, where opt_d{a◦C(d)} = a◦opt_d{C(d)}

for any a that does not depend upon d. In the context of sequential decision processes, the cost Cn of making decision dn may be a function not only of the decision itself, but also of the state (d1, d2, . . . , dn−1) in which the decision is made. To emphasize this, we will rewrite (1.5) as

(21)

h(d₁, . . . , d_n) = C₁(d₁|∅) ◦ C2(d₂|d1)◦ . . . ◦ Cn(d_n|d1, . . . , d_n−1). (1.6) We now deﬁne h as (weakly) separable if

h(d1, . . . , dn) = C1(d1)◦ C2(d1, d2)◦ . . . ◦ Cn(d1, . . . , dn). (1.7) (Strong separability is, of course, a special case of weak separability.) If h is (weakly) separable, we then have

opt_d₁_∈D₁{optd2∈D2(d1){. . . {optdn∈Dn(d1,...,d_n−1){h(d1, . . . , dn)}} . . .}}

= opt_d₁_∈D₁{opt_d₂_∈D₂_(d₁₎{. . . {opt_d_n_∈D_n_(d₁_,...,d_n₋₁₎{C1(d1|∅) ◦ C2(d2|d1)◦ . . . . . .◦ Cn(d_n|d1, . . . , d_n−1)}} . . .}}

= opt_d₁_∈D₁{C1(d1|∅) ◦ opt_d₂_∈D₂_(d₁₎{C2(d2|d1)◦ . . .

. . .◦ optdn∈Dn(d1,...,d_n−1){Cn(d_n|d1, . . . , d_n−1)} . . .}}. (1.8) Let the function f (d₁, . . . , d_i−1) be deﬁned as the optimal solution of the sequential decision process where the decisions d1, . . . , di−1 have been made and the decisions di, . . . , dn remain to be made; i.e.,

f (d1, . . . , di−1) = opt_d_i{opt_d_i+1{. . . {opt_d_n{Ci(di|d1, . . . , di−1)◦

C_i+1(d_i+1|d1, . . . , d_i)◦ . . . ◦ Cn(d_n|d1, . . . , d_n−1)}} . . .}}.

(1.9) Explicit mentions of the decision sets Di are omitted here for convenience.

We have then

f (∅) = optd1{optd2{. . . {optdn{C1(d₁|∅) ◦ C2(d₂|d1)◦ . . . . . .◦ Cn(dn|d1, . . . , dn−1)}} . . .}}

= opt_d₁{C1(d₁|∅) ◦ optd2{C2(d₂|d1)◦ . . . . . .◦ optdn{Cn(dn|d1, . . . , d_n−1)} . . .}}

= opt_d₁{C1(d1|∅) ◦ f(d1)}. (1.10) Generalizing, we conclude that

f (d1, . . . , di−1) = opt_d_i_∈D_i_(d₁_,...,d_i−1₎{Ci(di|d1, . . . , di−1)◦ f(d1, . . . , di)}.

(1.11) Equation (1.11) is a recursive functional equation; we call it a functional equation since the unknown in the equation is a function f , and it is recursive since f is deﬁned in terms of f (but having diﬀerent arguments). It is the dynamic programming functional equation (DPFE) for the given optimization problem. In this book, we assume that we are given DPFEs that are properly formulated, i.e., that their solutions exist; we address only issues of how to obtain these solutions.

(22)

1.1.2 Dynamic Programming Functional Equations

The problem of solving the DPFE for f (d1, . . . , di−1) depends upon the sub- problem of solving for f (d1, . . . , di). If we deﬁne the state S = (d1, . . . , di−1) as the sequence of the ﬁrst i−1 decisions, where i = |S|+1 = |{d1, . . . , di−1}|+1, we may rewrite the DPFE in the form

f (S) = opt_d_i_∈D_i_(S){Ci(di|S) ◦ f(S)}, (1.12) where S is a state in a set S of possible states, S = (d1, . . . , di) is a next- state, and∅ is the initial state. Since the DPFE is recursive, to terminate the recursion, its solution requires base cases (or “boundary” conditions), such as f (S₀) = b when S₀∈ Sbase, whereSbase ⊂ S. For a base (or terminal) state S₀, f (S₀) is not evaluated using the DPFE, but instead has a given numerical constant b as its value; this value b may depend upon the base state S₀.

It should be noted that the sequence of decisions need not be limited to a fixed length n, but may be of indefinite length, terminating when a base case is reached. Different classes of DP problems may be characterized by how the states S, and hence the next-states S, are defined. It is often convenient to define the state S, not as the sequence of decisions made so far, with the next decision d chosen from D(S), but rather as the set from which the next decision can be chosen, so that D(S) = or d∈ S. We then have a DPFE of the form

f (S) = opt_d_∈S{C(d|S) ◦ f(S)}. (1.13) We shall later show that, for some problems, there may be multiple next- states, so that the DPFE has the form

f (S) = opt_d_∈S{C(d|S) ◦ f(S)◦ f(S)} (1.14) where S and S are both next-states. A DPFE is said to be r-th order (or nonserial if r > 1) if there may be r next-states.

Simple serial DP formulations can be modeled by a state transition system or directed graph, where a state S corresponds to a node (or vertex) and a decision d that leads from state S to next-state S is represented by a branch (or arc or edge) with label C(di|S). D(S) is the set of possible decisions when in state S, hence is associated with the successors of node S. More complex DP formulations require a more general graph model, such as that of a Petri net, which we discuss in Chap. 5.

Consider the directed graph whose nodes represent the states of the DPFE and whose branches represent possible transitions from states to next-states, each such transition reﬂecting a decision. The label of each branch, from S to S, denoted b(S, S), is the cost C(d|S) of the decision d, where S= T (S, d), where T : S × D → S is a next-state transition or transformation function.

The DPFE can then be rewritten in the form

(23)

f (S) = opt_S{b(S, S) + f (S)}, (1.15) where f (S) is the length of the shortest path from S to a terminal or target state S0, and where each decision is to choose S from among all (eligible) successors of S. (Diﬀerent problems may have diﬀerent eligibility constraints.) The base case is f (S0) = 0.

For some problems, it is more convenient to use a DPFE of the “reverse”

form

f(S) = opt_S{f(S) + b(S, S)}, (1.16) where f(S) is the length of the shortest path from a designated state S₀ to S, and S is a predecessor of S; S₀ is also known as the source state, and f (S₀) = 0 serves as the base case that terminates the recursion for this alternative DPFE. We call these target-state and designated-source DPFEs, respectively. We also say that, in the former case, we go “backward” from the target to the source, whereas, in the latter case, we go forward from the

“source” to the target.

Different classes of DP formulations are distinguished by the nature of the decisions. Suppose each decision is a number chosen from a set{1, 2, . . . , N}, and that each number must be chosen once and only once (so there are N decisions). Then if states correspond to possible permutations of the numbers, there are O(N !) such states. Here we use the “big-O” notation ([10, 53]): we say f (N ) is O(g(N )) if, for a sufficiently large N , f (N ) is bounded by a constant multiple of g(N ). As another example, suppose each decision is a number chosen from a set {1, 2, . . . , N}, but that not all numbers must be chosen (so there may be less than N decisions). Then if states correspond to subsets of the numbers, there are O(2^N) such states. Fortuitously, there are many practical problems where a reduction in the number of relevant states is possible, such as when only the final decision d_i−1in a sequence (d₁, . . . , d_i−1), together with the time or stage i at which the decision is made, is significant, so that there are O(N²) such states. We give numerous examples of the different classes in Chap. 2.

The solution of a DP problem generally involves more than only computing the value of f (S) for the goal state S^∗. We may also wish to determine the initial optimal decision, the optimal second decision that should be made in the next-state that results from the ﬁrst decision, and so forth; that is, we may wish to determine the optimal sequence of decisions, also known as the optimal

“policy” , by what is known as a reconstruction process. To reconstruct these optimal decisions, when evaluating f (S) = opt_d_∈D(S){C(d|S)◦f(S)} we may save the value of d, denoted d^∗, that yields the optimal value of f (S) at the time we compute this value, say, tabularly by entering the value d^∗(S) in a table for each S. The main alternative to using such a policy table is to reevaluate f (S) as needed, as the sequence of next-states are determined; this is an example of a space versus time tradeoﬀ.

(24)

1.1.3 The Elements of Dynamic Programming

The basic form of a dynamic programming functional equation is

f (S) = opt_d_∈D(S){R(S, d) ◦ f(T (S, d))}, (1.17) where S is a state in some state space S, d is a decision chosen from a decision space D(S), R(S, d) is a reward function (or decision cost, denoted C(d|S) above), T (S, d) is a next-state transformation (or transition) function, and

◦ is a binary operator. We will restrict ourselves to discrete DP, where the state space and decision space are both discrete sets. (Some problems with continuous states or decisions can be handled by discretization procedures, but we will not consider such problems in this book.) The elements of a DPFE have the following characteristics.

State The state S, in general, incorporates information about the sequence of decisions made so far. In some cases, the state may be the complete sequence, but in other cases only partial information is sufficient; for example, if the set of all states can be partitioned into equivalence classes, each represented by the last decision. In some simpler problems, the length of the sequence, also called the stage at which the next decision is to be made, suffices. The initial state, which reflects the situation in which no decision has yet been made, will be called the goal state and denoted S^∗. Decision Space The decision space D(S) is the set of possible or “eligible”

choices for the next decision d. It is a function of the state S in which the decision d is to be made. Constraints on possible next-state transfor- mations from a state S can be imposed by suitably restricting D(S). If D(S) =∅ , so that there are no eligible decisions in state S, then S is a terminal state.

Objective Function The objective function f , a function of S, is the op- timal proﬁt or cost resulting from making a sequence of decisions when in state S, i.e., after making the sequence of decisions associated with S.

The goal of a DP problem is to ﬁnd f (S) for the goal state S^∗.

Reward Function The reward function R, a function of S and d, is the proﬁt or cost that can be attributed to the next decision d made in state S. The reward R(S, d) must be separable from the proﬁts or costs that are attributed to all other decisions. The value of the objective function for the goal state, f (S^∗), is the combination of the rewards for the complete optimal sequence of decisions starting from the goal state.

Transformation Function(s) The transformation (or transition) function T , a function of S and d, speciﬁes the next-state that results from making a decision d in state S. As we shall later see, for nonserial DP problems, there may be more than one transformation function.

Operator The operator is a binary operation, usually addition or multiplica- tion or minimization/maximization, that allows us to combine the returns of separate decisions. This operation must be associative if the returns of decisions are to be independent of the order in which they are made.

(25)

Base Condition Since the DPFE is recursive, base conditions must be spec- iﬁed to terminate the recursion. Thus, the DPFE applies for S in a state spaceS, but

f (S₀) = b,

for S₀ in a set of base-states not in S. Base-values b are frequently zero or inﬁnity, the latter to reﬂect constraints. For some problems, setting f (S₀) =±∞ is equivalent to imposing a constraint on decisions so as to disallow transitions to state S0, or to indicate that S0 ∈ S is a state in which no decision is eligible.

To solve a problem using DP, we must deﬁne the foregoing elements to reﬂect the nature of the problem at hand. We give several examples below.

We note ﬁrst that some problems require certain generalizations. For example, some problems require a second-order DPFE having the form

f (S) = opt_d_∈D(S){R(S, d) ◦ f(T1(S, d))◦ f(T2(S, d))}, (1.18) where T1and T2are both transformation functions to account for the situation in which more than one next-state can be entered, or

f (S) = opt_d_∈D(S){R(S, d) ◦ p1.f (T1(S, d))◦ p2.f (T2(S, d))}, (1.19) where T1 and T2 are both transformation functions and p1 and p2 are multi- plicative weights. In probabilistic DP problems, these weights are probabilities that reﬂect the probabilities associated with their respective state-transitions, only one of which can actually occur. In deterministic DP problems, these weights can serve other purposes, such as “discount factors” to reﬂect the time value of money.

1.1.4 Application: Linear Search

To illustrate the key concepts associated with DP that will prove useful in our later discussions, we examine a concrete example, the optimal “linear search” problem. This is the problem of permuting the data elements of an array A of size N , whose element x has probability p_x, so as to optimize the linear search process by minimizing the “cost” of a permutation, deﬁned as the expected number of comparisons required. For example, let A ={a, b, c}

and pa = 0.2, pb = 0.5, and pc = 0.3. There are six permutations, namely, abc, acb, bac, bca, cab, cba; the cost of the fourth permutation bca is 1.7, which can be calculated in several ways, such as

1pb+ 2pc+ 3pa [using Method S]

and

(p_a+ p_b+ p_c) + (p_a+ p_c) + (p_a) [using Method W].