top of page

The Off-Policy Theory of Happiness

Why philosophers agree on what it takes to be happy.

I was a sophomore in college when I first realized that my parents had never told me, "Son, we just want you to be happy." It seemed like everyone else's parents had told them that whatever they did, it was okay as long as it made them happy. Why, I wondered, had my parents never said this to me?

I understood when I came across a passage from the autobiography of John Stuart Mill.

Mill was interesting guy. He had one of the highest IQs in human history (they didn't have intelligence tests at the time, but psychological historians have attempted to reconstruct his IQ from other evidence). His father, the venerable historian James Mill, began teaching him Ancient Greek at the age of three. By eight, he had read the whole of Herodotus's histories in the original. So I thought his life story might make an engaging read. But it's not. His autobiography is a total snooze-fest. As I recall it, the work is an exhaustive compilation of the least interesting things that Mill ever read, saw, or contemplated. A representative passage: "When we had enough of political economy, we took up the syllogistic logic in the same manner, Grote now joining us. Our first text-book was Aldrich, but being disgusted with its superficiality, we reprinted one of the most finished among the many manuals of the school logic, which my father, a great collector of such books, possessed, the Manuductio ad Logicam of the Jesuit Du Trieu. After finishing this, we took up Whately's Logic, then first republished from the Encyclopedia Metropolitana, and finally the Computatio sive Logica of Hobbes." For the love of God, John. Who cares?

Though I'm not exactly sure why, I trudged through it. And I'm glad I did.

But in order to understand what Mill says about happiness, you first need to understand a concept from artificial intelligence. It’s called reinforcement learning.

The basic idea of reinforcement learning is simple. It is a method for designing an agent—be it a person, a robot, a computer program—to behave intelligently. The definition of intelligence here is what computer scientists call “reward maximization.” Simply put, there is something that you want, and intelligent behavior consists in getting as much of it as possible. For example, if your agent is a robot that plays basketball, then its reward comes in the form of points. The more baskets the robot makes, the more points she gets and the more intelligently she behaved. Reinforcement learning is a mathematical solution to the way that the robot would learn to acquire more and more points.

At the heart of reinforcement learning is what’s known as a “policy.” It’s the robot’s playbook. A policy says, in mathematical abstraction, “This is where I am right now. This is what I have to do next to maximize my reward." In basketball, a good policy might be to get the ball, dribble it toward the basket, and toss in a lay-up. Each time the robot does this, she looks at how effective she was in getting points, and adjusts her behavior to do better next time. The robot might start off bad, but using reinforcement learning she could become better over time. That’s what intelligence means here—over time you get better and better at achieving your goal.

The idea might be simple, but all of the nuance in reinforcement learning comes from precisely how you learn that policy. For example, is the best policy to drive toward the basket? Or should you sit back and shoot jumpers? How do you know which is going to work out better next time around? Will the same policy work against a different opponent?

There are two general strategies for how to learn a policy. The first is called on-policy. It's the more straightforward of the two strategies. On-policy means that the robot uses the same information to make decisions and evaluate whether or not they were good decisions. If her policy says to drive toward the basket and that results in a lot of points, then she will be more likely to keep going with that same policy in the future. The second strategy is called off-policy. This means that the robot is using different information to make decisions than she is to evaluate them. The agent could make decisions based on, for instance, her time of possession of the ball. She could then look back at her play based on that policy and see if focusing on something else actually increased her number of baskets in the end.

At first, it might seem like the better strategy is always going to be on-policy. How could you score more points by focusing on something totally irrelevant? But that's not necessarily true. The empirical fact in artificial intelligence research is that some problems are better solved by off-policy methods. Sometimes the best way to attain a goal is indirectly.

This is precisely what Mill argues about happiness. The way to maximize your happiness, so to speak, is to aim at something else. Dedicate yourself to something larger than your own happiness. Work hard at that. Then you’ll look back and realize that you’ve been accruing happiness the whole time. Mill writes,

article continues after advertisement

“The enjoyments of life are sufficient to make it a pleasant thing when they are taken en passant without being made a principal objective. Once you make them so, you will immediately feel them to be insufficient. They will not bear a scrutinizing examination. Ask yourself whether you are happy, and you cease to be so. The only chance is for you to have as your purpose in life not happiness but something external to it. Let your self-consciousness, your scrutiny, your self-interrogation, exhaust themselves on that; and if you are otherwise fortunately circumstanced you will inhale happiness with the air you breathe, without dwelling on it or thinking about it, forestalling it in imagination, or putting it to flight by fatal questioning.”

In other words, the on-policy strategy doesn’t work for happiness. If you try to maximize for it directly, then you’re going to be worse off than if you had taken a different approach. Happiness is one of those problems that works better with the off-policy strategy. There has to be a separation between action and evaluation. If you’re using your own happiness as a metric by which to evaluate your next decision, the scope of your concern cannot extend past your own feelings. Instead, argues Mill, focus on something larger than yourself and you’ll wake up one day to realize that you inhale happiness with the air you breathe.

The reason, then, that my parents never told me to pursue happiness directly was that they, like Mill, believe in an off-policy approach to happiness. When someone tells you that you should "do what makes you happy," they're advocating for an on-policy approach to happiness—making decisions and evaluating them by the same metric. That's exactly what my parents didn't want me to do. And while my parents didn’t learn this from reading Mill, the surprising thing about this position on happiness that it is shared—in some version or another—by practically every other philosopher who has weighed in on the matter. 

One of my favorite of these accounts belongs to Bertrand Russell. He more or less says the same thing as Mill, but with a certain flair of nonchalance in contrast to Mill’s solemn weightiness. Russell writes in The Conquest of Happiness, "Fundamental happiness depends more than anything else upon what may be called a friendly interest in persons and things." He continues, "let your interests be as wide as possible, and let your reactions to the things and persons that interest you be as far as possible friendly rather than hostile."

Happiness, in other words, is the natural result of the observation that there are a great many persons and things in the world worth taking a friendly interest in, and only one of them is yourself. It is with this idea in mind that I want to write this blog. 


Originally posted on Psychology Today.

Recent Posts

See All

Notes from a PhD student who doesn't plan to pursue academia. A couple of weeks ago, Mickey Inzlicht and Yoel Inbar, of the excellent podcast Two Psychologists Four Beers, released a discussion of the

"Oneeee.. Twooooo... Three!!" In Buddhist countries, the general rubric for appearances is that modest is hottest. I knew they wouldn't let me into Yangon's famous Shwedagon pagoda with my knees showi

"Have you been to the third tree on the east-most corner of Lake Mwandishi?" What one expects when landing at an airport in South East Asia -- or for that matter, a developing country anywhere in the

bottom of page