Dynamic programming is a branch of operational research and a mathematical method to solve the optimization of decision-making process. In the early 1950s, American mathematician R.E.Bellman and others put forward the famous optimization principle when studying the optimization problem of multi-step decision-making process, which transformed the multi-step process into a series of single-stage problems and solved them one by one by using the relationship between the stages, thus creating a new method to solve this kind of process optimization problem-dynamic programming. 1957 published his masterpiece Dynamic Programming, which is the first book in this field. Since the advent of dynamic programming, it has been widely used in economic management, production scheduling, engineering technology and optimal control. For example, the shortest path, inventory management, resource allocation, equipment update, sequencing, loading and other issues, using dynamic programming method is more convenient than using other methods to solve. Although dynamic programming is mainly used to solve the optimization problem of dynamic process with time division, some static programming unrelated to time (such as linear programming and nonlinear programming) can be easily solved by dynamic programming method as long as time factor is artificially introduced and regarded as a multi-stage decision-making process. Dynamic programming is a way and method to solve optimization problems, not a special algorithm. Unlike the search or numerical calculation mentioned above, it has a standard mathematical expression and a clear solution. Dynamic programming is usually aimed at optimization problems. Because of the different nature of various problems, the conditions for determining the optimal solution are also different. Therefore, the design method of dynamic programming has its own characteristics for different problems, and there is no general dynamic programming algorithm, which can solve various optimization problems. Therefore, in learning, readers should not only correctly understand basic concepts and methods, but also analyze and deal with specific problems, build models with rich imagination and solve problems with creative skills. By analyzing and discussing the dynamic programming algorithm of several typical problems, we can also learn and master this design method step by step.
Edit this multi-stage decision-making problem
If an activity process can be divided into several interrelated stages, each stage needs to make decisions (measures), and the decision of one stage will often affect the decision of the next stage, thus completely determining the activity route of a process, which is called multi-stage decision-making problem. The decision of each stage constitutes a decision sequence, which is called strategy. There are several decisions to choose from at each stage, so there are many strategies for us to choose from. Corresponding to a strategy, the effect of the activity can be determined, which can be determined by quantity. Different strategies have different effects. Multi-stage decision-making problem is to choose an optimal strategy among the optional strategies to achieve the best effect under the predetermined standard.
Edit the related concepts in this paragraph.
Stage: divide a given problem-solving process into several interrelated stages to solve the problem. The number of stages can be different for different processes. Variables that describe a stage are called stage variables. In most cases, the stage variables are discrete and represented by K. In addition, there are cases where the stage variables are continuous. If a process can make a decision at any time and an infinite number of decisions are allowed between any two different times, then the stage variable is continuous. State: State indicates the natural or objective conditions faced at the beginning of each stage, which is not transferred by people's subjective will, and is also called uncontrollable factors. In the above example, the state is the starting position of a certain stage, which is not only the starting point of a road in this stage, but also the end point of a branch in the previous stage. The state of a process can usually be described by a number or a set of numbers, which are called state variables. Generally speaking, the state is discrete, but sometimes it is regarded as continuous for convenience. Of course, in real life, due to the limitation of variable form, all states are discrete, but from an analytical point of view, it is sometimes beneficial to regard states as continuous. In addition, the state can have multiple components (multi-dimensional case), so it is represented by vectors; In addition, the state dimension in each stage can be different. When the process develops in all possible different ways, the state variables of each segment of the process will take values in a certain range. A set of values of a state variable is called a state set. No aftereffect: We require that the state has the following properties: if the state of a certain stage is given, the development of the process after that stage is not affected by the state of the previous stages, and when all stages are determined, the whole process is determined. In other words, each implementation of a process can be represented by a sequence of states. In the previous example, the state of each stage is the starting point of the line. If the order of these points is determined, then the whole line is completely determined. Starting from the line after a certain stage, when the starting point of this section is given, it is not affected by the previous line (passing point). This property of state means that the history of a process can only affect its future development through the current state, which is called no aftereffect. Decision-making: given the state of one stage, the choice (action) that evolves from that state to the next stage is called decision-making. In optimal control, it is also called control. In many problems, decisions can be naturally expressed as a number or a set of numbers. Different decisions correspond to different values. Variables that describe decisions are called decision variables. Because the state does not meet the aftereffect, only the current state needs to be considered when selecting decisions at each stage, and the history of the process does not need to be considered. The range of decision variables is called allowable decision set. Strategy: A series of decisions at each stage are called strategies. For every practical multi-stage decision-making process, the available policies have certain scope restrictions, which are called allowable policy sets. The policy that allows the best effect in the policy set is called the optimal policy. Given the value of the state variable x(k) in stage K, if the decision variable in this stage is determined, the state variable x(k+ 1) in stage K+1is completely determined, that is, the value of x(k+ 1) varies with the value of x (k) and the decision u(k) in stage K. This is the law of state transition from k stage to k+ 1 stage, which is called state transition equation. Optimization principle: as the optimal strategy of the whole process, it satisfies that the remaining sub-strategies must constitute the "optimal sub-strategy" relative to the state formed by the previous decision. A problem that satisfies the optimization principle is also called having the optimal substructure property. The optimization principle is actually a sub-strategy of the optimal strategy of demand problem, and it is also optimal.