There’s been a recent fluttering of activity on the Internet
about a paper written by Harvard social psychologist Jason Mitchell, the full
text of which can be read here:
http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm. The crux of the issue seems to be that
Dr. Mitchell apparently sees little value in replication studies or in the
publication of negative results, a noted and alarming inverse of the current
trend among reputable scientists to decry the lack of those very types of
publications in most major journals for reasons I will discuss briefly (though
by no means completely) in this response.
Dr. Mitchell received his B.A. and M.S. from Yale and his
Ph.D. from Harvard, and is now a professor of psychology at Harvard where he is
the principal investigator at the University’s Social Cognitive and Affective
Neuroscience Lab (http://www.wjh.harvard.edu/~scanlab/people.html). I say this to point out that Dr.
Mitchell’s credentials appear impeccable, at least on paper. He’s a professor at one of the world’s
most prestigious universities (though the merit of such prestige in education
is often called into question, that is a discussion for another day), and
appears to have a consistent flow of publications in the scientific literature,
much of which, though I am completely unfamiliar with his work beyond this
single paper in question, appears to be of significant interest. Having established those credentials,
the duty now falls upon my shoulders to convince you that despite an apparently
productive career in social science, Dr. Mitchell appears never to have
received even the most rudimentary education on the basics of the scientific
method, either through oversight on the parts of his instructors or, more
likely, inattention on Dr. Mitchell’s part during those key lectures.
It is strongly recommended that you either read Dr.
Mitchell’s paper, “On the emptiness of failed replications” in its entirety
before returning to this document or that you read it alongside this discussion
so that his argument can be made to you in his own words. I would not wish to be accused of
misrepresenting his argument.
Nevertheless, I will proceed through the article point-by-point,
providing significant commentary along the way and quoting the source material,
though sparingly, so as to provide direct refutations.
Dr. Mitchell’s article begins with a bullet-pointed listing
of six postulates, each one of which is dead wrong. I will attempt my exploration of the faults in Dr.
Mitchell’s paper by examining each of these points in turn. The bulk of the paper is simply Dr.
Mitchell’s supporting arguments and evidence (such as they are) for these six
points. As such, the bulk of the
paper, though not often directly quoted here, will be addressed under the
headings of the six claims.
1) “Recent
hand-wringing over failed replications in social psychology is largely
pointless, because unsuccessful experiments have no meaningful scientific
value.”
Several years ago, I
had a chance encounter on the Internet with a gentleman who was pursuing his
doctorate in applied physics, specializing in acoustics. We became acquainted through commentary
on my girlfriend’s page on a social media website during a discussion of
evolutionary science and creationist dogma, during which debate this gentleman
revealed that, despite his scientific training, he was a young earth
creationist and that, further, he believed physics supported his position. Amongst his misunderstandings were
claims that because the Sun is burning up, it should be getting smaller, and a
belief that Einstein’s theory of special relativity suggests that as an object
approaches the speed of light, it loses mass (when in reality, objects
approaching light-speed approach infinite mass). I mention this frustrating conversation because until now,
it was the greatest misunderstanding of science I have ever heard from someone
claiming any degree of professional training in the sciences. Dr. Mitchell has the dubious honor of
having surpassed that creationist’s achievement. This creationist, at least, made a show of doing real
science and claiming the evidence supported his argments (however misguided
those claims were). Dr. Mitchell’s
approach to science, if I dare call it an approach to science, appears to
suggest that any study failing to confirm the experimenter’s hypothesis is useless.
For those of you who
aren’t already either rolling off your chair in fits of uncontrollable laughter
at Dr. Mitchell’s expense or banging your head against your desk in frustration
for much the same reason, I will pause for a moment to explain the
ludicrousness of Dr. Mitchell’s position (and offer the promise of further
hilarity to follow).
To begin with, the
“hand-wringing” as Dr. Mitchell dismissively refers to a growing collective
concern amongst scientists, is very well-deserved. If you follow the scientific world, you may have heard of
something called “publication bias.”
The idea is that journals tend to like to publish positive results of
exciting experiments because those grab headlines and help sell the publication
to professional readers. There’s
nothing particularly evil about this on its face, except when you realize that
replication is a key part of the scientific process for reasons we’ll discuss
in greater depth later on (but it basically comes down to being sure that a
published result wasn’t just a phantom due to random chance or experimenter
error), and that these replication studies (being the “un-sexy” sort of work
that just sets out to question or to establish the credibility of previously
published work) find extremely limited venues for their publication. When they are published, and there is
certainly no guarantee they will be, it is often in obscure journals that fail
to reach even a sizeable fraction of the readership of the original paper. The result of this, concern over which
is dismissed by Dr. Mitchell as “pointless” and “hand-wringing,” is that erroneous
papers which reach publication (yes, despite all the best efforts, erroneous
information does get published either due to oversight or, rarer, deliberate
misrepresentation of research in order to get published) may wait a
considerably long time before they are corrected--if, indeed, they are ever
corrected. This means there is a
distinct possibility (nay: probability) that some indeterminate amount of the
information accepted into the body of scientific knowledge is wrong.
None of this is
intended to cast doubt upon science as a method of knowing. Indeed, the
scientific method, when properly applied, is specifically designed to avoid
just this sort of situation. The
problem we currently face with the issue of publication bias in the sciences is
not a problem with the science, but with the politics that have come to
dominate within the halls of academia, and to which science unfortunately often
takes a backseat in the minds of the administrators who perpetuate the
problem. This, however, is not
intended to be a referendum on politics in academia, but a discussion of the
flaws with Dr. Mitchell’s little paper, so I will refrain from heading down the
rabbit hole (some might call it a black hole) of academic politics.
Even if replication
studies were not of any importance, however--even if Dr. Mitchell’s apparent
assumption that original research is always flawless were completely and
undeniably true--there would still be much to find fault with in just this
first bullet point. He claims that
“unsuccessful experiments have no meaningful scientific value.” There is a bit of an ambiguity in that
statement, and the Principle of Charity would compel me to address the best
possible interpretation of his claim.
I will do so, though I will then explore the more troubling
interpretation because I actually believe the more troubling interpretation to
be the interpretation Dr. Mitchell originally intended.
The ambiguity has to
do with the phrase “unsuccessful experiments.” By that does Dr. Mitchell mean an experiment which has been
compromised by error? Or does he
mean an experiment which yields negative results?
Let us examine the
former. If he does indeed mean to
discuss experiments which have gone wrong, and yielded inaccurate information
due to some experimental error (or even chance fluctuations), then he is
arguably correct (though barely so) in suggesting that these experiments have
no meaningful scientific value.
The problem, however, is that by conflating this statement with a
condemnation of replication studies, he betrays an assumption that original
research is always performed with greater accuracy than replication
studies. To be sure, this is
sometimes the case. I am by no
means suggesting that a replication study is of greater merit than its
predecessor. What I am saying, and
what I believe any competent scientist would say, is that when two studies show
up with contradictory results, it indicates that at least one of them contains some kind of error. It is then for the scientific community to conduct further
examination (whether that is a closer reexamination of the data or a completely
new experiment) in order to determine which. Certainly it is of scientific value to determine which of
two contradictory studies is invalid, even if that means we then determine that
this particular study is completely
invalid and without value. Unless
we assume the infallibility of original research, these negative replication
studies do provide scientific value because they help us to determine which of
the original studies need to be reexamined. Furthermore, even completely failed experiments often lead
scientists to explore new, previously unconsidered hypotheses, so there is
indirect scientific value in that way as well.
I do not, however,
suspect that this is what Dr. Mitchell intended to say. Rather, it is my assumption, based on
phrasing later in the article equating the term “scientific failure” with “an
experiment [that] is expected to yield certain results, and yet… fails to do so,”
that Dr. Mitchell means an “unsuccessful experiment” to refer to any experiment
which fails to support the researcher’s hypothesis. This is a much more troubling interpretation of his words,
however, for two primary reasons.
Make no mistake, if
Scientist B wishes to conduct the study as a replication study, she is
well-advised to do so. Replication
is essential. It’s very possible
that Scientist A made some mistake in his original experiment, and Scientist B
might be able to correct that mistake.
However, such replications become meaningless when negative results are
never published. This view that
negative results are of no scientific value dooms generations of scientists to
endlessly follow the same dead-end trails. It slows scientific progress, costs millions of dollars of
grant money which could be better spent elsewhere, and wastes the productive
time of countless scientists.
Let’s not pretend we have an overabundance of qualified scientists,
either. Every man-hour is
precious, especially in a world where so much of the general population is far
more content to spend their lives watching television than working in a
laboratory.
I will close this discussion
of Dr. Mitchell’s first bullet-point (oh yes, we still have five more of his
inane bullet-points, plus several points from the main body of the article to
get through before we draw this discussion to an end) with a personal story. Some years back, I was asked to
participate as a judge for a local private school’s science fair, a duty I was
happy to perform. While wandering
from presentation to presentation with my fellow judges, I noticed something of
a trend amongst the entries.
Namely, most were very traditional (one might be tempted to say clichéd)
science fair projects. This is not
less than one would expect from a school limited to kindergarten through eighth
grade, so I did not judge particularly harshly, but I did make a mental note
that for many of the students, the science fair was about producing a flashy
display. There was a remote
controlled robot or two, several volcanoes, and many presentations along those
lines. The quality of display was
occasionally impressive, but there was very little science actually being
done. Then I happened across one
of the last entries of the day. It
was from a student whose family had recently immigrated from Mexico. His English, though far more impressive
than my Spanish would be given a similar amount of time to study, was extremely
limited, and his family had very little money with which to purchase supplies,
but he wanted to enter the science fair nonetheless. Unable to afford flashy props, he did a simple
experiment. He filled basketballs
to various levels of air pressure to determine which was the most bouncy. He hypothesized that the fullest ball
would be the bounciest. To test
this, he filled one ball to regulation pressure, overfilled one, and
underfilled another. He found,
contrary to his hypothesis, that the medium-filled ball was actually the
bounciest. Granted, this was not a
rigorously controlled scientific experiment that would be worthy of publication
in even the most lenient of journals.
However, this student was the only one of the many entries to actually
do real science. He conducted a
proper experiment, achieved a result that did not support his hypothesis, and
wrote up his display (with his teacher’s help to get his English right) to tell
us all about what he had found. I
do not recall the results of the science fair once all the judges’ scores were
compiled, but he received my highest marks. If he had taken Dr. Mitchell’s postulate that “failed”
experiments are of no scientific value to heart, that would never have taken
place.
2) “Because experiments can be undermined by a
vast number of practical mistakes, the likeliest explanation for any failed
replication will always be that the replicator bungled something along the
way. Unless direct
replications are conducted by flawless experimenters, nothing interesting can
be learned from them.”
Upon reading this
statement, I withheld some hope that clarification would be forthcoming in the
body of the text; clarification that might serve to negate the glaring
oversight in Dr. Mitchell’s claim.
Indeed, further clarification was provided, but instead of negating his
error, Dr. Mitchell doubled down on his mistake.
Lest I get ahead of
myself as I explore this idea (albeit in much briefer terms than the previous
point), allow me to bludgeon you, dear reader, with the obvious: Dr. Mitchell
fails to account for the fact that the replicator may be a more skilled
experimenter than the scientist who produced the original finding.
Dr. Mitchell is
correct about one thing in this analysis.
It is clearly possible that the replicator might have “bungled something
along the way.” It happens. As humans, we err. This is undeniable and hardly worth
pointing out. Except, it seems
that Dr. Mitchell struggles not only with the philosophical side of science,
but also with the self-evident traits of humanity. Certainly, this is a forgivable oversight, however. He is, after all, only a scientist
working in a discipline dedicated to understanding the traits of humanity. But I digress.
The problem is that
the statement can easily be reversed.
Let me give it a try: “Because experiments can be undermined by a vast
number of practical mistakes, the likeliest explanation for any positive
experimental result will always be that the researcher bungled something along
the way. Unless original research
is conducted by flawless experimenters, nothing interesting can ever be learned
from it.” If that sounds to you
like absolute garbage, you are absolutely correct. Dr. Mitchell’s great failure is in assuming inerrancy on the
part of original researchers and incompetence on the part of replicators. In reality, replicators and original
researchers are often the very same people. As a reputable scientist, it should be part of every
researcher’s job to do both original research and replication studies as the
need arises for either. There
would be nothing wrong with specializing in one or the other, but a
well-balanced approach to research by doing some of both is probably the best
way to advance not only the collective scientific knowledge but one’s personal
knowledge of one’s own discipline.
Putting aside the old bugaboo of academic politics, I would think the
best way to advance the goal, not necessarily of career advancement but of
scientific advancement, would be to do a bit of both. Nevermind all that, though. Let’s assume for the moment that we have entered into a
fantasy world where scientists are allowed to do either original research or
replications but not both. Is
there some magical force that bestows competence disproportionately upon one
rather than the other? Of course
not. There will be incompetents
and geniuses on both sides, and the average will always be average.
Dr. Mitchell is
correct that experimental error is a problem that needs to be addressed in any
replication study and though he seems to forget that the same is true of
original research, he is correct to suggest that examining replications for
experimental error is a worth-while pursuit.
What Dr. Mitchell
seems not to understand is that replication is not an argument that an
experiment is somehow better the second time it is performed or when done in a
different laboratory than in the first case. The point of replication is that, just as he argues that
there can be mistakes in replication experiments, there are mistakes or unknown
factors in original research, too.
Replication is essential to determine the robustness of a finding. If ten studies show a finding to be
valid and a new study fails to replicate it, we still examine all eleven,
though we do so with the assumption that the fault might likely lie in the new
study. However, if only two
studies have been done, we must examine both very carefully to determine which
is more likely correct. There is
the further possibility that all of the studies, even with their conflicting
results, can be valid, and that there is just some small change in experimental
conditions that renders the studies different. This could lead to entirely new discoveries.
I will illustrate
with this example (note: these studies are fictitious and not based on any real
data of any kind). Let us imagine
that Scientist X from the University of Timbuktu conducts an experiment and
finds that when given 12-volt electric shocks, people perform better at chess
than a control group. Then,
Scientist Y from the University of Nantucket conducts a replication trial. The experiment is performed in exactly
the same conditions, but Scientist Y finds no such effect. What could be happening? Scientist Z
from the University of Neverland reads both papers. He writes letters to both scientists to make sure the
experiments were identical, and reexamines the raw data from both experiments
to determine which of the studies was wrong, but he finds no experimental error
on either side, no problems with data entry, certainly no fraud, and nothing at
all to indicate which study was correct.
Can you solve this little problem?
Certainly it would seem that Dr. Mitchell would immediately assume that
Scientist X is correct and Scientist Y has made some undetectable mistake. However, perhaps the real solution is
that they are both correct. There
is no flaw in the University of Timbuktu study, but it is incomplete. It fails to account for the fact that,
in Nantucket, they rather enjoy electric shocks due to some previously
undiscovered environmental factor, so they are immune to the effects of the
experimental manipulation in the study by Scientist X. Of course it’s a stupid example, but I
think it vividly illustrates the point that Dr. Mitchell, for all his laudable
attempts to avoid experimental error reaching the literature, has ignored the
possibility that replication studies can bring new insights in addition to oversight.
If an original study
is superior to the replication study that finds different results, it should be
very easy for the original researchers to defend their work. They could point out the flaws in the
replication, or they could conduct further research or call for independent
research. Any of these approaches
could vindicate the original study and show the replication to be
incorrect. Instead of taking this
proper approach, Dr. Mitchell suggests that we should ignore replication
entirely because sometimes a replicator might get it wrong. He forgets that in science, truth is
determined not by who published first but by who has the best evidence. All of his anticipated problems with
replication are easily dismissed simply by providing the evidence that shows
the original study correct.
3) “Three standard rejoinders to this critique
are considered and rejected. Despite claims to the contrary, failed replications do not provide
meaningful information if they closely follow original methodology; they do not
necessarily identify effects that may be too small or flimsy to be worth
studying; and they cannot contribute to a cumulative understanding of
scientific phenomena.”
Moreso than the
other five points, this one relies heavily on the body of the essay to
understand its meaning. The basic
idea is that Dr. Mitchell is considering three responses to his critique. While I’m sure that these responses are
real ones, I question his selection because they were not the first three that
came to my mind. Could Dr.
Mitchell be attempting to subtly erect a straw man? At the very least, he seems not to be arguing against the
best form of his opponents’ arguments.
Nevertheless, these three points are worth examining.
The first point is
one which I must, unfortunately, rely upon quoting in its entirety, so that you
may fully appreciate the ineptitude of the argument:
There are three standard rejoinders to these points. The first is to argue that because the
replicator is closely copying the method set out in an earlier experiment, the
original description must in some way be insufficient or otherwise
defective. After all,
the argument goes, if someone cannot reproduce your results when following your
recipe, something must be wrong with either the original method or in the
findings it generated.
This is a barren defense. I
have a particular cookbook that I love, and even though I follow
the recipes as closely as I can, the food somehow never quite looks as good as
it does in the photos. Does this mean that the recipes are deficient,
perhaps even that the authors have misrepresented the quality of their
food? Or could it be that
there is more to great cooking than simply following a recipe? I do wish the authors would specify
how many millimeters constitutes a “thinly” sliced onion, or the maximum torque
allowed when “fluffing” rice, or even just the acceptable range in degrees
Fahrenheit for “medium” heat. They
don’t, because they assume that I share tacit knowledge of certain culinary
conventions and techniques; they also do not tell me that the onion needs to be
peeled and that the chicken should be plucked free of feathers before
browning. If I do not
possess this tacit know-how—perhaps because I am globally incompetent, or am relatively new to
cooking, or even just new to cooking Middle Eastern food specifically—then naturally, my outcomes will differ from
theirs.
Likewise, there is more to being a successful experimenter than merely
following what’s printed in a method section. Experimenters develop a sense, honed over
many years, of how to use a method successfully. Much of this knowledge is implicit. Collecting meaningful neuroimaging data, for
example, requires that participants remain near-motionless during scanning, and
thus in my lab, we go through great lengths to encourage participants to keep
still. We whine about
how we will have spent a lot of money for nothing if they move, we plead with
them not to sneeze or cough or wiggle their foot while in the scanner, and we
deliver frequent pep talks and reminders throughout the session. These experimental events, and countless more
like them, go unreported in our method section for the simple fact that they
are part of the shared,
tacit know-how of competent researchers in my field; we also fail to report
that the experimenters wore clothes and refrained from smoking throughout the
session. Someone
without full possession of such know-how—perhaps because he is globally incompetent, or new to science, or even
just new to neuroimaging specifically—could well be expected to bungle one or more of these important, yet unstated, experimental
details. And because
there are many more ways to do an experiment badly than to do one well,
recipe-following will commonly result in failure to replicate.
Of course, the
myriad problems with Dr. Mitchell’s analogy should not require great lengths to
expose.
The first problem is
the same problem encountered above.
Dr. Mitchell assumes that all providers of original research are, as if
by some divine right, more competent practitioners than providers of
replication studies. This is
simply not so. It should be
clearly stated that cooking and science are two entirely different practices
and that any analogy is bound to be imperfect (cooking is, after all, much more
of an art than a science).
However, in the interest of proceeding along established terms, allow me
to offer a better analogy. Dr.
Mitchell compared replication studies to his amateur attempts to reproduce
recipes from his favorite cookbook.
I fancy myself a rather good cook, but I can sympathize--my food doesn’t
always come out looking as good as the photo in the cookbook. Do I think that this means the authors
misrepresented their recipes?
No. Dr. Mitchell is right
to think not. As an amateur, he is
not expected to cook as well as the professionals who wrote his cookbook. However, if Chef Gordon Ramsay or Chef
Wolfgang Puck (or whoever your favorite chef might be) attempted to recreate
the recipes, following them precisely, combining the detailed descriptions with
the established knowledge of culinary practices that Dr. Mitchell points out
are generally understood but not explicitly stated and the food still came out
significantly worse than the photograph would indicate, then I might begin to
suspect that the cookbook has some flaw.
Dr. Mitchell assumes in his argument that he is the one trying to
recreate the recipe. The reality
of replication is that it could just as easily be Chef Ramsay.
None of this is to
say that science should be judged based on the fame or credentials of the
scientist. No, scientific
questions must be determined based on the evidence. But it is the height of both arrogance and short-sightedness
to assume that anyone who would bother to replicate a study must be new to
science and thus less worthy of attention than the author of the original
paper.
Replication is
essential precisely because (amongst other reasons), people who are new to a
particular discipline conduct original research as well, and their mistakes
could lead to erroneous papers.
However, there is
another claim within this section worthy of attention. This is the idea that some of the “real
work” (to borrow a phrase from the magicians) is not explicitly published. There is both truth and falsehood to
this. It is certainly true that
the most mundane details of experimental practice are not explicitly stated in
every paper. However, if there is
a practice which is not expected to be common knowledge, it should be
explicitly stated. Dr. Mitchell
explains that subjects must remain near-motionless during neuroimaging scans,
and alludes to techniques used in his lab to make sure this is the case. It needn’t be stated, because anyone
doing such a scan will already know, that the subject needs to remain
motionless. However, specific
actions taken to ensure this motionless state should be noted, either in the
paper reporting original research, or in a separate paper established
experimental methodology which can be cited when that methodology is used in
such research. I do not suspect
this to be the case with the methods detailed in Dr. Mitchell’s footnote (in
which he lists several such techniques which are never mentioned explicitly in
the methods section of his papers), but it is an ever-present possibility an
experimental result could be affected by such conditions the experimenter finds
unimportant. If such notes make a
paper too long for publication, they should be published elsewhere (perhaps on
the same website that would be better used for experimental methodological tips
than mindless ramblings about how useless replication is), so that both
potential replicators and the merely interested can fully understand the
experimental procedure in place during any experiment upon which they will base
a scientific belief. In Dr.
Mitchell’s case, it is common knowledge and needn’t be stated that the subject
must remain still. The phrasing
used to achieve this, while apparently innocent enough, can vary from laboratory
to laboratory and should probably be noted somewhere so that no errors are
made. Similarly, though Dr.
Mitchell’s cookbook probably doesn’t say so, I’m sure there is a publication
somewhere that would gladly specify that important detail that a bird must be
plucked of feathers prior to cooking.
The second argument
is that a phenomenon which has a small effect size or is difficult to replicate
might nonetheless be real.
True. But how does one
determine that? Through further studies.
The studies should be replicated both using the same and with new
techniques to tease out the reality of the situation. No one has ever suggested that a failed replication
necessarily means an unreal phenomenon in every case. It means an attempt at replication has failed, nothing more
and nothing less. The implications
of that failure are a subject both for discussion and for further experimental
investigation. Dr. Mitchell’s
examples fall short because in the very same paragraph where he decries
replication because it might have “killed” fields of inquiry we now know to be
important, he makes reference to further study validating the original
findings. It would seem that Dr.
Mitchell only objects to replication when it falsifies original research, and
frankly, that’s just bad science.
It’s also worth
noting that if there is flimsy evidence, it would be unwise to believe a
claim. That doesn’t mean it’s
wrong, but the scientific method is based upon skeptical inquiry. We should have been skeptical about
those findings Dr. Mitchell uses as his examples because evidence was flimsy in
the early days. It wasn’t until
new methods were found to investigate these phenomena (as Dr. Mitchell points
out) that the original studies were vindicated. So the time to believe them is now that the evidence is
in. The time to believe them was
not early on when they were little more than promising hypotheses. But it is not our side that is trying
to shut down inquiry. It is Dr.
Mitchell’s side (if indeed there are more than one lone misguided soul who
ascribe to his view) that would seek to stifle inquiry by tacitly accepting
original research without even the consideration of its replicability. Replicability is not the only factor
that makes a theory robust, but it is certainly an important factor.
The final counterargument
that Dr. Mitchell attempts to address is, I think, one of the stronger
points. As I mentioned earlier
when I explained publication bias, there is an asymmetry between positive and
negative results, even in studies of the very same phenomenon. Dr. Mitchell claims that science
requires an asymmetry between positive and negative results, harking back to
that old chestnut that absence of evidence is not evidence of absence. He claims that no matter how many
papers might be published claiming that swans are only white, it only took one
study to prove that there can be black ones. This is all very true, but a better analogy would be
Sasquatch (or Bigfoot or Yeti, depending upon your region). Would Dr. Mitchell seriously suggest
that if one person publishes a photograph of a Sasquatch that we should
immediately ignore any paper which argues to the contrary? Certainly it is true that there could be such a being, but it, like
everything else in science, should be treated with the same skepticism that is
necessary for science to work. We believe in claims when there is sufficient
robustness of evidence to outweigh the skeptical counterarguments. No one is saying that we should believe
scientific claims based entirely upon the number of papers suggesting one
position or the other (although certainly that is an important factor to bear
in mind when formulating opinions).
But it is certainly important to read those papers that show a published
effect might not really exist. If
the evidence in one paper is stronger than the other, believe that one. If the evidence in one is not clearly
stronger than the other, we need a new experiment. But we can’t possibly begin to even consider all of this
until replication has been attempted and either succeeded or failed.
Dr. Mitchell then
offers this nugget of wisdom: “After
all, the argument goes, if an effect has been reported twice, but hundreds of
other studies have failed to obtain it, isn’t it important to publicize that
fact? No, it isn’t.” Actually, that’s exactly the kind of
information the scientific community needs. We needn’t know the numbers of studies on one side or the
other. We need to know the quality
of research on both sides, and we can only do that when all of that research is
published. It’s quite possible
there could be two great positive studies and hundreds of other studies all of
which were conducted by idiots or baboons. It’s more likely that either two researchers made a mistake,
or that there is some other factor causing the difference. If the latter is the case, it’s
important to have all of the information on the table, so we can attempt to
isolate that other factor.
4) “Replication efforts appear to reflect strong
prior expectations that published findings are not reliable, and as such, do
not constitute scientific output.”
Well, I didn’t
realize that a scientist’s intentions
were how we judged whether or not paper constituted scientific output. I thought scientific claims’ validity
was judged based on the strength of the evidence. Silly me.
The basis of this
argument is that, if a belief in the hypothesis can result in a bias in favor
of positive results, then if the replicator believes the result to be invalid,
this can result in a bias toward negative results. These biases are real. And it is possible that many replicators are
interested only in falsifying results that disagree with their preconceptions,
though Dr. Mitchell seems to have an abnormally low view of scientists when he
assumes that this is almost universally the case. Indeed, the main two reasons to replicate a study are either
to detect possible errors if one thinks the study was in error or to offer
further independent support if one thinks the original work was valid. But the scientific process is
specifically designed to minimize the impacts of these biases.
Once again, I must
allow Dr. Mitchell’s own words to condemn him: “But consider how the replication project inverts this procedure—instead of trying to locate the sources of
experimental failure, the replicators and other skeptics are busy trying to
locate the sources of experimental success. It is hard to imagine how this makes any sense unless one has a strong
prior expectation that the effect does not, in fact, obtain. When an experiment
fails, one will work hard to figure out why if she has strong expectations that
it should succeed. When
an experiment succeeds, one will work hard to figure out why to the extent that
she has strong expectations that it should fail. In other words, scientists try to explain
their failures when they have prior expectations of observing a phenomenon, and
try to explain away their successes when they have prior expectations of that
phenomenon’s nonoccurrence.”
It is perfectly
valid to explore either causes of positive or negative results (I refuse to
consider this in terms of experimental success or failure for reasons detailed
above). The point of the
experiment is to isolate cause and effect, so if there is another possible
cause for an effect (whether that effect is a positive or a negative result),
it is within the proper purview of the scientist to try to find it. This is a good thing. Dr. Mitchell seems to think that the
point of science is to offer proof of one’s predetermined conclusions, but this
is not the case at all. While
supporting a pet hypothesis or falsifying a rival hypothesis may be the initial
motivation to embark upon a study, any reputable scientist places truth above personal
preference and seeks the best explanation for a given phenomenon.
Unfortunately, Dr.
Mitchell has shown Professor Dawkins wrong on one small point. Apparently not all scientists even bother to pay lip-service to the scientific
ideal. Real scientists have no
interest in explaining away results they dislike, whether positive or
negative. They may be initially
skeptical, and they certainly demand evidence, and they may even embark upon a
replication study in order to further examine that evidence. But once that evidence is in, if it
conflicts with their views, they must admit they had been wrong.
5) “The field of social psychology can be
improved, but not by the publication of negative findings. Experimenters should be encouraged to
restrict their "degrees of freedom," for example, by specifying
designs in advance.”
Actually, putting
aside a few phrases, Dr. Mitchell is to be commended for this small section of
his essay. For the reasons already
discussed and for the reasons I will discuss in the continuance of this
conversation below, he’s dead wrong about his opposition to the publication of
negative findings. However, except
for suggesting that this is not the way to improve the field of social
psychology, the suggestions he does make are quite reasonable ones. I won’t rehash everything he said in
that section here, but it boils down to increased standards for published
research. On that point, we can
all agree.
There is a phrase
that bothers me a bit, though, and I want to address it: “All scientists are motivated to find positive
results, and social psychologists are no exception.” This is true, of course, but I think it is problematic and
that Dr. Mitchell would have us completely ignore the problem behind it. Scientists are motivated to find
positive results partly because they like to confirm their pet hypotheses. This is true. However, this is small motivation indeed when one realizes
that most people become scientists because they want to understand the
world. If that means rejecting a
pet hypothesis, most scientists (as Richard Dawkins points out) at the very
least pay lip-service to the ideal.
For me, rejecting a pet hypothesis may be unpleasant for a day or two,
but that emotion soon gives way to the much more profound emotion when I
realize that having done so, I have eliminated a false belief and may now
substitute a true one. I think
most scientists understand and agree with that desire to follow the evidence
wherever it leads and to always seek to discover the truth.
So why, then, are
scientists so motivated to find positive results? Precisely because there is such a bias against publishing
negative results. In academia, if
you don’t publish research, your career is doomed to be a short one. But if you find negative results, you
often find yourself with work that can’t find a market in which to publish. Nevermind that this research might be
the result of five years’ work involving dozens of collaborators and research
assistants--if it’s negative, it doesn’t get published. So of
course there’s a bias toward finding positive results. But it’s not necessarily a
philosophical bias. Indeed, there
are lots of us (I know--I’ve spoken to them) who actually like negative results
because they show us there is more to be learned (“My dear fellow, I wish to
thank you…”). But if we’re trying
to meet publication requirements for career advancement, negative results are
politically (not scientifically) undesirable.
6) “Whether they
mean to or not, authors and editors of failed replications are publicly
impugning the scientific integrity of their colleagues. Targets of failed replications are
justifiably upset, particularly given the inadequate basis for replicators’
extraordinary claims.”
Whether he means to
or not, I think Dr. Mitchell is revealing his true motivation for writing this
article here. He has conflated
replication studies with accusations of deliberate misrepresentation of data! A replication study, even if it is
negative, does not impugn anything.
Nor is a replication study an attempted pissing contest between the
replicator and the author of the original research. Indeed, it is possible to perform a replication study while
maintaining the greatest of respect for the original author or while having no
opinion of him or her at all.
Failed replication does not, need not, and should not be considered an
insult to the integrity of the original author unless there is very good reason
to suspect deliberate fraud.
Let us imagine a
failed replication has been published. What are some possible reasons for this
eventuality?
a) The original
research is valid, and the replicator made a mistake.
b) The original
research is valid, and the replication study failed due to chance
c) The original
research is valid, and the replicator falsified his findings
d) The original
research is invalid; the original author made a mistake
e) The original
research is invalid; the original author falsified his findings
f) The original
research is invalid; the original finding was due to chance
g) The original
research is valid but incomplete; there are other factors at work
In only one of those
situations is the original author’s integrity challenged. In only one other is his competence
even slightly called into question.
It may be uncomfortable to have your work questioned, but that’s just
part of science. It shouldn’t be
taken as an attack unless it is coupled with a direct accusation of
impropriety. Those accusations
should not be taken lightly. They
should be taken seriously but false accusations should also be met with strict
consequences. Science is an
honorable profession, and fraud is rare but intolerable. False accusations of fraud are similarly
rare but also intolerable. This is
not what replication is about, however.
Replication is simply about determining whether original findings hold
up.
By convention, we
consider a finding to be statistically significant at a p-level of less than
0.05. That means we accept a 5%
chance of a false positive due simply to statistical chance (not considering
experimental error). That means
that, all else being equal, as much as 5% of what gets published could be
wrong, just based on accepted standards for publication. We could restrict our p-levels to less
than 0.01 if we wanted to, but that still leaves us with 1% of all published
research possibly being wrong.
Replication, if nothing else, is about minimizing those probabilities by
re-running the experiments to see if the same results happen again. Even if we put aside all possibility of
experimental error, misrepresentation, or incomplete understanding of
contributory factors, we must replicate research in order to weed out
statistical anomalies. Restricting
p-levels to prohibitively low probabilities won’t do, either, because the more
restrictive our statistical tests, the more likely we are to reject findings
that are actually real. That’s
just as bad. So what do we do? We replicate.
Dr. Mitchell himself
points out, “On the occasions
that our work does succeed, we expect others to criticize it mercilessly, in
public and often in our presence.”
No doubt, it can be quite uncomfortable. Science is hard work, and it’s a tough business. If someone thinks you’re wrong, they
have no problem saying so, and they expect the same of you. That’s the way it should be. There’s no ill will about it--it’s just
a matter of subjecting all claims to the strictest of scrutiny. Anyone who has ever so much as
presented a poster understands the feeling of coming under fire. Anyone who has defended a thesis knows
it better than the rest. When we
think someone is wrong, we say so.
When we aren’t sure, we test it, and then we say what the results
were. There’s very little coddling
or hand-holding in this field, and there needn’t be. Scientists are adults, and as such should be able to take
professional criticism for what it is and avoid taking it personally. Replication studies are one more type
of potential criticism (though they can also support the original research, as
Dr. Mitchell regularly forgets).
He concludes his
essay with the following line: “One senses either a profound naiveté or a
chilling mean-spiritedness at work, neither of which will improve social
psychology.”
It seems that exactly one senses such things at work here and that one is called Dr. Jason Mitchell. The rest of the scientific community seems to understand that replication is not a mean-spirited personal attack, but just part of the job. Dr. Mitchell’s complaints seem, though I admittedly speak only of a general impression and not from any sort of evidence here, to be the whiny complaints of someone whose pet theory has been called into question. Instead of calling replicators (who, need I remind you, are just other scientists, just like anyone else, and most often also producers of their own original research) “mean-spirited,” the mature scientist realizes that replication is an essential component of the scientific process and that we neglect it at our peril.
It seems that exactly one senses such things at work here and that one is called Dr. Jason Mitchell. The rest of the scientific community seems to understand that replication is not a mean-spirited personal attack, but just part of the job. Dr. Mitchell’s complaints seem, though I admittedly speak only of a general impression and not from any sort of evidence here, to be the whiny complaints of someone whose pet theory has been called into question. Instead of calling replicators (who, need I remind you, are just other scientists, just like anyone else, and most often also producers of their own original research) “mean-spirited,” the mature scientist realizes that replication is an essential component of the scientific process and that we neglect it at our peril.
This essay prompted
science journalist Ben Lillie to take to Twitter with this comment (quoted in:
http://io9.com/if-you-love-science-this-will-make-you-lose-your-sh-t-1601429885?utm_campaign=socialflow_io9_facebook&utm_source=io9_facebook&utm_medium=socialflow
): “Do you get points in social psychology for publicly declaring you have no
idea how science works?” I think
that sums up the quality of Dr. Mitchell’s essay quite nicely, though I object
to the association of Dr. Mitchell with the rest of the field of social
psychology. The social and
behavioral sciences have struggled long and hard to achieve strict scientific
standards. Ill-informed tirades
like Dr. Mitchell’s contribute to a popular misconception that these fields are
not “true” sciences. They are and
they should be. It is unfortunate
that many of their practitioners seem to disagree, but let us not besmirch the
image of entire fields based on the “contributions” of a few of their members
who prefer not to follow the rules of science.
Throughout this
response, harsh though I may have been (though I assure you, my commentary is
no more biting than what is generally expected of any controversial statement
among scientists), I have striven to avoid making any sort of personal attack
or commentary about Dr. Mitchell.
I don’t know him personally, so it would be improper to do so. I have attempted to restrict my
commentary to his arguments themselves and to his apparent lack of
understanding of the scientific process.
However, since he chose to close his article by calling scientists who
conduct replication studies “naïve” and “mean-spirited,” I feel no guilt at
closing my response by pointing out one additional quotation buried in Dr.
Mitchell’s essay: “I was mainly
educated in Catholic schools….”
Yeah, we can
tell. Which might explain why Dr.
Mitchell prefers to treat social psychology as a religion rather than a
science.
4 comments:
Brilliant response. How ironic that a Colorado magician understands science 1000 times better than a Harvard professor!!
Thanks, Mark.
In fairness, I think of myself as a scientist even above and beyond being a magician. I'm still at university, so one would have expected a Harvard professor's knowledge to be more advanced, though. But it was, actually, my interest in psychological science that is largely responsible for getting me started in magic. I wear a lot of different hats (scientist, writer, magician, etc), but they all seem to work quite well together for me.
Extraordinarily good response. Very, very well done. Also wanted to add that I completely agree with your comments on my tweet. That was done in an ill-advised moment of snark, and I didn't think it would get nearly the exposure that it did. I did follow it up with a number of tweets about how good most psych researchers are, but of course those didn't get much attention. C'est la Twitter.
Thanks for this, Bob! What an exceptionally bright response!
Post a Comment