Search

The Online Encyclopedia and Dictionary

 
     
 

Encyclopedia

Dictionary

Quotes

 

Bellman equation

Bellman equations occur in dynamic programming. A Bellman equation is also called an optimality equation or a dynamic programming equation. This approach was developed by Richard Bellman.

In reinforcement learning a Bellman equation refers to a recursion for expected rewards. For example, the expected reward for being in a particular state s and following some fixed policy π has the Bellman equation:

Vπ(s) = R(s) + γP(s' | s,π(s))Vπ(s')
s'

while the equation for the optimal policy is referred to as the Bellman optimality equation:

V * (s) = R(s) + maxaγP(s' | s,a)Vπ(s')
s'

the difference being that rather than taking the action prescribed by some policy π, we take the action that gives the best expected return.

Example

The recursive Bellman equation used to find a maximum of the dynamic programming problem:

\max_{ 	\left \{ x_{t+1} \right \}_{t=0}^{\infty} }  \sum_{t=0}^{\infty} \beta^t F(x_t,x_{t+1})

such that

\begin{matrix} x_{t+1} \in \Gamma (x_t), & t = 0, 1, 2, ... \\ x_0 \in X, & Given \end{matrix}

can be written as:

V(x) = \max_{y \in \Gamma (x) } [F(x,y) + \beta V(y)], \forall x \in X.

Here

y \in \Gamma (x)

is dependent on the state x, and

y(x)

is the policy function .

Last updated: 10-15-2005 16:54:23
The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License. How to see transparent copy