Reinforcement learning (RL) suffers when tasks change. For example, RL does not have a built-in mechanism to switch between modes of exploitation and exploration, which is critical to mapping optimal rewards in a variable environment. Here, Dudman and Li show that mice—like humans—model rewards on highly flexible probability distributions.
Why are there errors in our decisions? Is it that noise gets into a perfect signal, or that decision machinery is inherently probabilistic? This paper argues the latter. From the perspective of optimal coding—where optimal means metabolically efficient—it offers a useful lens through which we can view modern findings in neuroscience and behavior research.