Numerical Computing with Python
上QQ阅读APP看书,第一时间看更新

Dynamic programming

Dynamic programming is a sequential way of solving complex problems by breaking them down into sub-problems and solving each of them. Once it solves the sub-problems, then it puts those subproblem solutions together to solve the original complex problem. In the reinforcement learning world, Dynamic Programming is a solution methodology to compute optimal policies given a perfect model of the environment as a Markov Decision Process (MDP).

Dynamic programming holds good for problems which have the following two properties. MDPs, in fact, satisfy both properties, which makes DP a good fit for solving them by solving Bellman Equations:

  • Optimal substructure
    • Principle of optimality applies
    • Optimal solution can be decomposed into sub-problems
  • Overlapping sub-problems
    • Sub-problems recur many times
    • Solutions can be cached and reused
  • MDP satisfies both the properties - luckily!
    • Bellman equations have recursive decomposition of state-values
    • Value function stores and reuses solutions

Though, classical DP algorithms are of limited utility in reinforcement learning, both because of their assumptions of a perfect model and high computational expense. However, it is still important, as they provide an essential foundation for understanding all the methods in the RL domain.