Fully Observable Markov Decision Process

From GM-RKB

Jump to navigation Jump to search

A Fully Observable Markov Decision Process is a Markov decision process that is also a fully observable process.

Context:
- …
Counter-Example(s):
- Partially Observable Markov Decision Process
See: Finite Discrete-Time Fully Observable MDP.

References

2012

(Mousam & Kolobov, 2012) ⇒ Mousam, and Andrey Kolobov. (2012). “Planning with Markov Decision Processes: An AI Perspective.” In: Morgan & Claypool Publishers. ISBN: 1608458865, 9781608458868.
- QUOTE: A finite discrete-time fully observable MDP is a tuple [math]\displaystyle{ (S,A,D,T,R) }[/math], where:
  - S is the finite set of all possible states of the system, also called the state space;
  - A is the finite set of all actions an agent can take;
  - D is a finite or infinite sequence of the natural numbers of the form (1, 2, 3, . . . , Tmax) or (1, 2, 3, . . .) respectively, denoting the decision epochs, also called time steps, at which actions need to be taken;
  - T : S × A × S × D→ [0, 1] is a transition function, a mapping specifying the probability T (s1, a, s2, t) of going to state s2 if action a is executed when the agent is in state s1 at time step t ;
  - R : S × A × S × D → R is a reward function that gives a finite numeric reward value R(s1, a, s2, t) obtained when the system goes from state s1 to state s2 as a result of executing action a at time step t .

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Fully_Observable_Markov_Decision_Process&oldid=762043"

Stub