Reinforcement learning (RL) suffers when tasks change. For example, RL does not have a built-in mechanism to switch between modes of exploitation and exploration, which is critical to mapping optimal rewards in a variable environment. Here, Dudman and Li show that mice—like humans—model rewards on highly flexible probability distributions.
Why are there errors in our decisions? Is it that noise gets into a perfect signal, or that decision machinery is inherently probabilistic? This paper argues the latter. From the perspective of optimal coding—where optimal means metabolically efficient—it offers a useful lens through which we can view modern findings in neuroscience and behavior research.
Mello et al. find that the majority of neurons encoding time in the striatum, a central basal ganglia structure, contract and dilate their firing based on the task interval. These results highlight how neural activity is translated into optimally timed motor actions.
A look at how D1 dopamine receptors might play a role in temporal processing (based on an action performed). Interestingly, a 2 Hz (delta band) stimulation improves performance.
This paper is one among many that contrast the presumes (and tested) roles of the basal ganglia and cerebellum in performing actions that are either rhythmic or based on single intervals.