article

A Partially Observable Markov Decision Process (POMDP) is an extension of a Markov Decision Process. POMDPs are used for choosing actions when the entire world, or state space, is not always directly observable. In other words, you cannot always immediately know where you are in the world and what is going on. For instance, chess is a directly observable game: you can always see the positions of all the pieces. Poker, on the other hand, is partially observable: you can narrow down which cards are in your opponents' hands from your cards, their bids, and the cards on the table. However, you cannot directly observe which cards are still in their hands (without cheating!).

Currently, most POMDPs are computationally intractable to solve exactly for optimal behavior, so computer scientists have developed methods that approximate solutions for POMDPs.

POMDPs can be used in solving simple path planning problems for mobile robots. This application is sometimes called the "Kidnapped Robot Problem," framed by imagining a robot was moved to an unknown location in a known environment and now must figure out where it is and find its way home. An exact solution to the POMDP will generate the series of actions that is most likely to get it home with the least cost.

Definition


A Partially Observable Markov Decision Process (POMDP) is a tuple (S,A,O,P,R), where

  • S is the State space,
  • A is the action space,
  • O is the observation space.
  • P(s,a,o,s') = \Pr(s_{t+1}=s',\, o_t=o | s_t = s,\, a_t=a) is the probability that action a in state s at time t will give observation o and lead to state s' at time t+1,
  • R(s,a,s') is the immediate reward that is issued for the transition from s to s' under action a.

The goal is to maximize some cumulative function of the rewards, typically the discounted sum under a discounting factor γ (usually just under 1).

Markov decision processes are a special case of POMDPs in which the observations always uniquely identify the true state (i.e., the states are directly observed or can be directly deduced from the observation).

External links


 

This article is licensed under the GNU Free Documentation License. It uses material from the "Partially observable Markov decision process".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld