Reinforcement Learning and Optimal Control

by Dimitri P. Bertsekas

ISBN: 978-1-886529-39-7
Publication: 2019, 388 pages, hardcover
Price: $89.00
AVAILABLE

EBOOK at Google Play

Preview at Google Books

Contents, Preface, Selected Sections

Video Course from ASU, and other Related Material

Errata

Ordering, Home

This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming (DP), but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively known by several essentially equivalent names: reinforcement learning, approximate dynamic programming, and neuro-dynamic programming. They have been at the forefront of research for the last 25 years, and they underlie, among others, the recent impressive successes of self-learning in the context of games such as chess and Go.

Our subject has benefited greatly from the interplay of ideas from optimal control and from artificial intelligence, as it relates to reinforcement learning and simulation-based neural network methods. One of the aims of the book is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field. Another aim is to organize coherently the broad mosaic of methods that have proved successful in practice while having a solid theoretical and/or logical foundation. This may help researchers and practitioners to find their way through the maze of competing ideas that constitute the current state of the art.

This book relates to several of our other books: Neuro-Dynamic Programming (Athena Scientific, 1996), Dynamic Programming and Optimal Control (4th edition, Athena Scientific, 2017), Abstract Dynamic Programming (2nd edition, Athena Scientific, 2018), and Nonlinear Programming (3rd edition, Athena Scientific, 2016).

However, the mathematical style of this book is somewhat different. While we provide a rigorous, albeit short, mathematical account of the theory of finite and infinite horizon dynamic programming, and some fundamental approximation methods, we rely more on intuitive explanations and less on proof-based insights. Moreover, our mathematical requirements are quite modest: calculus, a minimal use of matrix-vector algebra, and elementary probability (mathematically complicated arguments involving laws of large numbers and stochastic convergence are bypassed in favor of intuitive explanations).

The book illustrates the methodology with many examples and illustrations, and uses a gradual expository approach, which proceeds along four directions:

From exact DP to approximate DP: We first discuss exact DP algorithms, explain why they may be difficult to implement, and then use them as the basis for approximations.
From finite horizon to infinite horizon problems: We first discuss finite horizon exact and approximate DP methodologies, which are intuitive and mathematically simple, and then progress to infinite horizon problems.
From deterministic to stochastic models: We often discuss separately deterministic and stochastic problems, since deterministic problems are simpler and offer special advantages for some of our methods.
From model-based to model-free implementations: We first discuss model-based implementations, and then we identify schemes that can be appropriately modified to work with a simulator.

Dimitri P. Bertsekas is Fulton Professor of Computational Decision Making at the Arizona State University, McAfee Professor of Engineering at the Massachusetts Institute of Technology, and a member of the prestigious United States National Academy of Engineering. He is the recipient of the 2001 A. R. Raggazini ACC education award and the 2009 INFORMS expository writing award. He has also received 2014 ACC Richard E. Bellman Control Heritage Award for "contributions to the foundations of deterministic and stochastic optimization-based methods in systems and control," the 2014 Khachiyan Prize for Life-Time Accomplishments in Optimization, the SIAM/MOS 2015 George B. Dantzig Prize, and the 2022 IEEE Control Systems Award. Together with his coauthor John Tsitsiklis, he was awarded the 2018 INFORMS John von Neumann Theory Prize, for the contributions of the research monographs "Parallel and Distributed Computation" and "Neuro-Dynamic Programming".

The following papers and reports have a strong connection to material in the book, and amplify on its analysis and its range of applications.

Bertsekas, D., "Multiagent Reinforcement Learning: Rollout and Policy Iteration," ASU Report Oct. 2020; to be published in IEEE/CAA Journal of Automatica Sinica.

Bertsekas, D., "Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning," ASU Report, April 2020.

D. P. Bertsekas, "Multiagent Rollout Algorithms and Reinforcement Learning," arXiv preprint arXiv:1910.00120, September 2019.

Bhattacharya, S., Sahil Badyal, S., Wheeler, W., Gil, S., Bertsekas, D.,"Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems," IEEE Robotics and Automation Letters, to appear, 2020.

D. P. Bertsekas, "Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning," Lab. for Information and Decision Systems Report, MIT, October 2018; a shorter version appears as arXiv preprint arXiv:1910.02426, Oct. 2019.

D. P. Bertsekas, "Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations," Lab. for Information and Decision Systems Report, MIT, April 2018 (revised August 2018); arXiv preprint arXiv:1804.04577; a version published in IEEE/CAA Journal of Automatica Sinica. (Lecture Slides). (Related Video Lecture).

[Return to Athena Scientific Homepage]