Mice infer probabilistic models for timing

Reinforcement learning (RL) suffers when tasks change. For example, RL does not have a built-in mechanism to switch between modes of exploitation and exploration, which is critical to mapping optimal rewards in a variable environment. Here, Dudman and Li show that mice—like humans—model rewards on highly flexible probability distributions.