上QQ阅读APP看书，第一时间看更新

Dynamic programming

Dynamic programming is a sequential way of solving complex problems by breaking them down into sub-problems and solving each of them. Once it solves the sub-problems, then it puts those subproblem solutions together to solve the original complex problem. In the reinforcement learning world, Dynamic Programming is a solution methodology to compute optimal policies given a perfect model of the environment as a Markov Decision Process (MDP).

Dynamic programming holds good for problems which have the following two properties. MDPs, in fact, satisfy both properties, which makes DP a good fit for solving them by solving Bellman Equations:

Optimal substructure
- Principle of optimality applies
- Optimal solution can be decomposed into sub-problems
Overlapping sub-problems
- Sub-problems recur many times
- Solutions can be cached and reused
MDP satisfies both the properties - luckily!
- Bellman equations have recursive decomposition of state-values
- Value function stores and reuses solutions

Though, classical DP algorithms are of limited utility in reinforcement learning, both because of their assumptions of a perfect model and high computational expense. However, it is still important, as they provide an essential foundation for understanding all the methods in the RL domain.