Fully Observable Markov Decision Process
Jump to navigation
Jump to search
A Fully Observable Markov Decision Process is a Markov decision process that is also a fully observable process.
- Context:
- …
- Counter-Example(s):
- See: Finite Discrete-Time Fully Observable MDP.
References
2012
- (Mousam & Kolobov, 2012) ⇒ Mousam, and Andrey Kolobov. (2012). “Planning with Markov Decision Processes: An AI Perspective.” In: Morgan & Claypool Publishers. ISBN: 1608458865, 9781608458868.
- QUOTE: A finite discrete-time fully observable MDP is a tuple [math]\displaystyle{ (S,A,D,T,R) }[/math], where:
- S is the finite set of all possible states of the system, also called the state space;
- A is the finite set of all actions an agent can take;
- D is a finite or infinite sequence of the natural numbers of the form (1, 2, 3, . . . , Tmax) or (1, 2, 3, . . .) respectively, denoting the decision epochs, also called time steps, at which actions need to be taken;
- T : S × A × S × D→ [0, 1] is a transition function, a mapping specifying the probability T (s1, a, s2, t) of going to state s2 if action a is executed when the agent is in state s1 at time step t ;
- R : S × A × S × D → R is a reward function that gives a finite numeric reward value R(s1, a, s2, t) obtained when the system goes from state s1 to state s2 as a result of executing action a at time step t .
- QUOTE: A finite discrete-time fully observable MDP is a tuple [math]\displaystyle{ (S,A,D,T,R) }[/math], where: