Exploration-exploitation models and the evolution of animal signals
Many problems encountered in everyday life involve a period of exploration (information gain) followed by a period of exploitation (information use). For example if you are visiting Ottawa, do you go to a different restaurant every night (explore) or do you keep returning to the first satisfactory restaurant you find (exploit)? Clearly the degree to which you should be prepared to invest in new information depends in part on how many opportunities you will have to use your knowledge in the future. Payoff maximising solutions to one class of exploration-exploitation (“bandit”) problems have been identified analytically, but dynamic programming can numerically identify the optimal decision to a broad range of problems. These exploration-exploitation algorithms have long been employed to help resolve a range of human dilemmas, including the point at which clinicians should consider switching patients from a control to the treatment drug once evidence gathers that the treatment is effective. Here I argue that non-human animals are also often faced with exploration-exploitation dilemmas and that natural selection has helped generate satisfactory solutions. In so doing I show that bandit models can readily account for a broad array of empirically-observed phenomena. For example, I show that they can explain: (i) why predators attack more unfamiliar unpalatable prey before rejecting them the more common they are, (ii) why unpalatable species tend to evolve a similar appearance, (iii) why younger individuals are typically more prepared to approach novel objects that older individuals, and (iv) why superstitions develop not just in humans, but also in non-human animals.