• Twitter Social Icon
  • Instagram Social Icon

"The good life is one inspired by love and guided by knowledge."

-Bertrand Russell


I am broadly interested in social cognition, especially with a computational and neural approach. I outline a number of specific research interests below, but most of them come down to a computational account of how people make decisions in social context. I'm also interested in language, especially--and this might come as a surprise--its computational basis and social function. You can read about my previous projects here

How does theory of mind work (or fail to work) in an intergroup context? 

We know generally that one's social group has a significant impact on how that person views reality--for instance, a martian visitor to earth might be surprised to learn that American Republicans and Democrats have access to a shared physical reality. In other words, people from the same social groups have more similar models of reality to one another than to people outside that group. Consequently it should be more difficult to make inferences about another person's world model if that person is in a different social group from oneself's, simply because their model will be quite different than one's own. Furthermore, because of the curse of knowledge, we tend to assume that other people's maps look more like our own than they actually do.


One potentially interesting way to study this goes back to Stanley Milgram's (1974) study of psychological maps (employing a decidedly more benign methodology than some of his more celebrated work). Essentially, Milgram had participants draw a map of an environment with which they were familiar--in this case, Parisians drawing Paris. While the methodological flaws of this approach were many, it yielded a lot of interesting data. One unexplored extension is the theory of mind problem inherent in the idea of psychological maps: how do we recreate someone else's map when we don't have direct access to it? This is essentially what we're doing when we make inferences about someone else's model of the world, yet the way we go about it is quite mysterious.

I have several hypotheses about using Milgram's psychological map approach across different social groups:

(1) Our maps are more similar to ingroup members than outgroup members.
(2) we're better able to recreate the maps of ingroup members better than outgroup members
(3) people would be more likely to judge someone favorably if their map looks similar to theirs

Social Categories as Dimensionality Reduction

Statisticians use dimensionality reduction to project high-dimensionality data to a subspace in which they can visualize it. Humans are the ultimate high-dimensionality data, and social categorization is a reduction of that complex whole to a dimensionality it can be easily understood (usually just one or two axes). This is one way to think of Tajfel's intergroup versus interpersonal strategies for understanding other people. With the interpersonal strategy you consider the person in all their high-dimensionality, idiosyncratic nuance. With the intergroup strategy you consider them along a single axis of their identity. Shifting from one strategy can be thought as a form of dimensionality reduction. Therefore, the myriad algorithms and approaches developed in machine learning for just this problem might offer some interesting ways of modeling social categorization.

The Influence Minority of Behavior on the Cultural Norms of the Majority

Here’s an example of a cultural norm that the majority of people in Massachusetts adhere to, even though they don’t know it: drinking Kosher. Almost all beverages sold in New England are certified Kosher, even though the Kosher population is a tiny minority of the whole population. Another recent example is that many restaurants in Massachusetts decline to provide disposable straws to their customers (on the grounds that all these tiny plastic straws are infiltrating oceanic ecosystems). In both cases, preferences that initially belong to a minority influence the behavior of the majority. A canonical example is the issue of gender neutral bathrooms—minority preferences influence majority norms. The behavior of a minority can be a crucial part of how social norms work. However, most theories of cultural norms take the behavior the majority as their starting point.

This is especially true of accounts that attempt to provide a computational basis for how individuals make judgments of norms. Most of these accounts (such as Malle et al, 2017) are premised on some sort of "threshold" whereby if a critical mass of people agree to the norm then it becomes widely enacted. I think there are important instances where the threshold assumption doesn't hold (as noted above), and that further theoretical models should be based on other premises. In particular, I'm interested in developing a probabilistic account for norms (instead of formal logic) which would account for the case of minority influence on the majority's norms. 

Curation as an Optimal Solution to the Exploration-Exploitation Dilemma

A classic problem in decision science is exploitation versus exploration: When should I choose the option I know to be best, and when should I choose a new option in the hope that it might prove even better? Most solutions take some form of "explore then exploit." However, another potential solution is social. Find someone with a value function similar to your own, and then allow them to create a choice set of high value options for you. For example, this is what the department store Nordstrom does: it curates a collection of clothes for a person with a certain value function. It's also what a Spotify playlist does: instead of having to go through every song ever, you just let Spotify curate a subset you'll probably like. It's also how we choose restaurants in a foreign city: we ask a friends who live there about where they eat. In short, people use this strategy a lot of the time. Yet no one has done a formal study of how it works. 

Redundant Word Choice Reflects Frequency of Obscure Words

There's an old adage about writing, drawn from the pages of Strunk & White's Elements of Style: "Omit needless words." Strunk & White's directive suggests, among other things, to drop the old from old adage because, well, if you look up the definition of adage it already includes the notion that it's old. It's redundant. Here's a similar example: I recently read a paper by Robin Dunbar on the "Functional Benefits of (Modest) Alcohol Consumption." When he uses the word "anxiolytic," he pairs it with the phrase "reduction of anxiety or stress." That phrase is redundant because you've already attempted to convey that notion with the word anxiolytic itself. However, when one begins to look at these kind of redundancies in aggregate you begin to see that, contrary to Strunk & White, there is a communicative benefit to the redundancy. It hedges your bets against the other person not knowing the more obscure word (e.g., adage, anxiolytic) by padding it with a modest definitional hint. In short, the fewer speakers who know the word, the more likely a speaker is to sprinkle in some redundancy. And redundancy is, of course, the kind of thing that information theory predicts a successful message should have. I think an information theoretic account of this effect could be pretty interesting. 

Previous Projects:

Imaginative Reinforcement Learning: Computational and Neural Perspectives 

One thing that humans can do that machines can't is imagine how the future is going to play out. This is a hallmark of human intelligence, and we'd expect that it's something artificially intelligent agents will need to be able to do in the future. This project represents an initial step toward trying to imbue reinforcement learning agents with imagination. 

In this project, we constructed a reinforcement learning model that made predictions about human behavior and brain activity in an experimental task. In the task, participants made predictions about how many points they would earn with a specific action. We showed that our model predicted both behavior and brain activity, as measured by functional MRI. 

Gershman, S.J., Zhou, J., & Kommers, C. (2017). Imaginative reinforcement learning: computational principles and neural mechanisms. Journal of Cognitive Neuroscience.

Computational Models of Jazz Improvisation Inspired by Language

Jazz musicians have been saying it for a century; cognitive scientists picked up on it and started saying it a couple decades ago; even neuroscientists have recently shown evidence that it's true in the brain: Jazz improvisation is a language, and speaking it is like having a conversation.


If this is true in more than a metaphoric sense, then we'd also expect that the same computational frameworks that work for language would also work for jazz music. This project explores that possibility.

In this project, I developed a novel dataset of jazz improvisation solos large enough to train a model. I used a probabilistic model called an interpolated n-gram to learn the transition structure, both rhythmic and melodic, of jazz improvisation solos. The model made predictions about which notes would come next in solos that it hadn't seen before. I used a measure known to computational linguists as "perplexity" so quantify the model's performance.


I did this work under the supervision of Prof Alan Yuille as part of my department of psychology honors thesis at UCLA.

Hierarchical Reasoning in Distributed Semantic Representations

Researchers in natural language processing have developed systems which can learn how to "do" metaphor. For example, given "King is to Queen as Man is to ___" the system can tell you the correct answer is "Woman." The only information that the system is given is about coöccurrence statistics. It implicitly understands the gender distinction between the words "king" and "queen" because they occur in different situations. It's pretty cool.

In this project, we showed that these systems can also perform hierarchical reasoning. For example, you can ask the system what Plato, Aristotle, Descartes, Locke, and Hume have in common and it will spit out the answer, "philosopher." The computational idea behind it is that if you sum the vector representations of each of these words (and then normalize the resulting vector), the average of all of them will look most like the vector for "philosopher" in the corpus of all vectors. 

Kommers, C., Ustun, V., Demski, A., & Rosenbloom, P. S. (2015). Hierarchical Reasoning with Distributed Vector Representations. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society.