Abstract:
This paper explores a variation of the multi-armed bandit problem and proposes a new strategy, the Pure Unknown strategy, to optimally maximize payoffs. In this game, the player chooses between two arms– one with known probability distributions and the other with unknown probability distributions–and does not realize the payoff of the arm she chooses until the next time period, where the probabilities of each time period are state dependent and those states are determined by a stochastic process. Additionally, elements of ambiguity aversion are incorporated into the model to reflect individuals’ preferences for choices with known probabilities over those with unknown probabilities. Four strategies, including the Pure Unknown strategy, play this game to see which strategy produces the highest average payoff, and the other three strategies are inspired by previous multi-armed bandit literature. Results find that the Pure Unknown strategy is the most optimal strategy when the Normality assumption, which ultimately represents ambiguity averse preferences, is not present.