New psychology study challenges the belief that humans learn by getting rewards

January 12, 2026

For decades, scientists believed that humans learn mainly by getting rewards — when a choice leads to a good outcome, we stick with it; when it doesn’t, we try something new.

But a new UC Berkeley study challenges that long-standing assumption, suggesting that the brain’s learning system is far more complex and far less dependent on rewards than previously thought.

Conducted by Psychology Professor Anne Collins and titled “A habit and working memory model as an alternative account of human reward-based learning,” the study argues that humans learn through the interaction of two very different mental processes: working memory and habit.

Working memory allows us to hold information in mind for short periods — like remembering a PIN long enough to type it. Habits, by contrast, are slower, more automatic and strengthen the repetition of behaviors whether or not they lead to good outcomes.

“Neither process on its own is really a good learner,” Collins said. “Working memory is too capacity limited, and habit doesn’t care about rewards. But we show that together, they end up allowing us to learn efficiently, and mimicking more complex reward-based learning abilities.”

The study found that when people had to learn just one or two things, they acted as reward-based learning would predict — they stopped repeating choices that didn’t work. But when the task involved more items — four, five or six — the pattern reversed. People actually became more likely to repeat their mistakes, something standard reward-based learning can’t easily explain.

Collins said this pattern is exactly what we’d expect if working memory and habit — not reward-based learning — are doing most of the work in this learning experiment.

“Working memory guides choices enough that you make good ones,” she explained. “In parallel, the habit process makes you want to repeat your choices — enough of which are guided by working memory that in the end it picks up the rewarding ones.”

Collins said the findings could reshape how scientists understand human intelligence and how learning actually works. They may also influence how clinicians diagnose and treat learning differences or cognitive disorders.

“Understanding how the processes come together to enable fast learning has important future implications, including helping better target where learning dysfunction arises in clinical populations,” Collins said. 

The study also raises concerns about how broadly scientists have been using the term “reinforcement learning” to describe learning from rewards. Treating it as an umbrella term, Collins said, can lead researchers to draw incorrect or incomplete conclusions due to its ambiguity.

She noted that “the role of habits and working memory is currently understudied in the context of flexible decision making, because on their own, neither is a flexible learning process.” But when combined, she said, these two processes produce “efficient, low cost learning.”

“It is important for cognitive scientists to consider how different processes work together,” she said. “As shown here, interactions between them tend to lead to more than the sum of their individual components.”

Anne Collins

Psychology Professor Anne Collins