How do we know whether a finding is legitimate or not?
There is a distinction that one learns about on the first day of a course in psychological research methods. It is the difference between internal and external validity.
Internal validity is scientific validity. The extent to which a researcher devises a solid experiment, controls for confounding variables, and executes the procedure as planned determines a finding’s internal validity. If it were to come to the researcher’s attention that a confounding variable for which they did not control could also explain their result, then the finding’s internal validity would be called into question. This validity is concerned with what happens inside the lab, while the experiment is happening.
External validity, in contrast, is ecological validity. How well does the researcher’s finding generalize to the world outside the lab? You could control for all the variables perfectly, execute your procedure flawlessly, and run a pristine experiment. But if the stimuli you’re using aren’t representative of what people are likely to encounter in real life, then the experiment lacks external validity. This validity is concerned with what happens outside the lab, what to make of the result after all the nitty-gritty has been finely tuned.
Ideally, a researcher would conduct an experiment that is unassailably valid both internally and externally. However, in practice, these considerations usually involve a tradeoff. The more externally valid your stimuli—the more precisely they can be measured and controlled—the more sterile they become and thereby reflect less of the inherent messiness of our everyday experience. The more realistic you make your stimuli, the less meticulous you’re able to be about what exactly you’re showing your participant.
The upshot is that without internal validity, you can’t draw scientific conclusions. But without external validity, you have nothing worth drawing conclusions about. Practically speaking, the best a researcher can hope for is a healthy and reasonable balance between the two. But how well does psychological research actually balance the tension between these two considerations? Is it fifty-fifty, equal parts external and internal? Or does one get prioritized at the expense of the other?
There is an important asymmetry between internal and external validity, which gives insight into the answer to this question. It has to do with how these different kinds of validity are measured. Scientists are trained every day of their professional lives to be sensitive to internal validity. They can spot a confounding variable in a study from a mile away. And once one is identified, it’s difficult to shake it off as inconsequential to the study’s findings. Perhaps more importantly, it’s embarrassing for a scientist to run a shoddy experiment in which people can easily point out procedural flaws. It is, in short, relatively obvious how to optimize for internal validity.
But external validity is not so easily optimized for. It is much more difficult to point at an experiment and claim that it bears little resemblance to the real world in a crucial and undeniable way. Such an issue will be regarded as perhaps a good point, but ultimately just an opinion. The experimental stimuli aren’t intended to be representative of the whole of the human cognitive experience, after all, but only a specific part of it. That’s what makes the experimental variables so well controlled and the theoretical predictions so parsimonious in the first place. There is also no such embarrassment of an accusation about lack of external validity, but rather a certain pride associated with being a hardline scientist who studies her phenomena of interest with clinical precision and ardent fastidiousness.
The result is that is psychological research biases toward internal rather than external validity. It can more easily be measured, and consequently is a much more apparent badge that proclaims, “here is the work of a legitimate scientist.” Those scientists who prioritize internal validity and scientific legitimacy are more susceptible to promotion, and this influences the constituency of the institution of psychological research as a whole to favor those who care more for the considerations of an internal than the external. The problem is that while psychological research becomes more ostensibly scientific, it becomes less connected to that which it intends to study. Human behavior is a fundamentally messy topic, and psychology benefits from existing in the tension between these two kinds of validity. Both are, after all, necessary for truly valid psychological research and neither is sufficient on its own. If we lose our sense of external validity because it is a trickier metric to optimize then psychological research suffers just as much as if we failed to construct solid experiments.
This is a consideration that gets overlooked in the “replication crisis” in which psychological research currently finds itself. The usual approach to addressing this crisis lies with better statistical analyses, which decrease the probability of a false or misleading finding. While this is surely a critical contribution to the internal validity of psychological findings, it takes external validity out from under the spotlight of attention. Psychology, the thinking goes, will not become healthy by an attempt to render it more externally valid, but only by making it more rigorously scientific.
This thinking, to my mind, is only one side of the story. Sure, psychology can improve its statistical methodology in the service of the field's betterment. But too stringent a concern on the internal runs the risk of leading to an equally illegitimate state of the science which pursues a significance that is more statistical than psychological. Our aversion to getting our hands dirty with the veridical messiness of the human experience may lead to us to forfeit the opportunity to go out there and actually work with nature itself, opting instead for the safety and clarity of the laboratory setting. This seems hardly like a psychology worth replicating.