Strong uniform value in gambling houses and partially observable Markov decision processes

Abstract

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any $epsilon>0$, the decision maker has a pure strategy $sigma$ which is $epsilon$-optimal in any $n$-stage problem, provided that $n$ is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any $epsilon>0$, the decision-maker can guarantee the limit of the $n$-stage value minus $epsilon$ in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.

Problem with this document? Please report it to us.