The
Science of Counting the DeadBy Rebecca Goldin
Ph.D.
A recent study published in the Lancet claims that over
650,000 “excess” deaths have occurred in Iraq since the invasion
in March, 2003. STATS look at how scientists figure these
numbers out, how their methods compare to other counts, and
whether criticism of the numbers is justified. A companion
article examines the media coverage.
10/17/06 "STATS" -- -- If you want to know the number
of people who died in 2005 from heart disease in the United
States, you need go no further than a website hosted by the
Centers for Disease Control (CDC), which collects the
information every year. Every death in the United States is
recorded by the National Center for Health Statistics, as is the
main cause of death.
There are, of course, imperfections. There can be more than
one cause of death or the cause can be unknown; a suicide might
have been a murder; sometimes a body is never found; there have
also been times when this system fails, such as when AIDS first
emerged.
War-torn countries do not have central registries to record
deaths. People do not necessarily die in hospitals, and their
bodies are not necessarily sent to morgues. While the press
makes no claim to having actually seen all the deaths that
occur, the website
Iraq Body Count (IBC) keeps a database of “media-reported
civilian deaths in Iraq that have resulted from the 2003
military intervention by the USA and its allies.” The IBC does
not count excess deaths due to a deterioration of
infrastructure, lack of hospitals or clean water. Nor does it
count deaths not reported by the media. At least in theory,
innumerable deaths occur quietly, under the radar screen of any
accounting office.
The Iraqi health ministry also counts deaths. However, the
BBC reported in 2005 that the recorded deaths were based on
hospital records, which are unreliable when records and even
hospitals are being destroyed. And in December 2003, the
ministry ordered a halt to all attempts to count civilian
deaths, according to the
Associated Press. Currently, the official number of dead is
about 50,000, based on hospital and morgue data.
Public health researchers have rejected this official tally
of deaths in favor of an epidemiological approach. In a careful
study published in the Lancet, a prestigious
British journal for medicine, professors from Johns Hopkins
University and the School of Medicine at Al Mustansirlya
Univesity in Baghdad found through a random sampling of Iraqi
households that over 650,000 deaths have occurred in Iraq since
the invasion in 2003, that would not have occurred had there not
been war.
While the Lancet numbers are shocking, the study’s
methodology is not. The scientific community is in agreement
over the statistical methods used to collect the data and the
validity of the conclusions drawn by the researchers conducting
the study. When
the prequel to this study appeared two years ago by the same
authors (at that time, 100,000 excess deaths were reported), the
Chronicle of Higher Education published a long article
explaining the
support within the scientific community for the methods
used.
President Bush, however, says he does “not
consider it a credible report” and the media refer to the
study as “controversial.”
And even as the Associated Press reported
mixed reviews, all the scientists quoted in its piece on the
“controversy” were solidly behind the methods used. Indeed, the
Washington Post points out that this and the earlier study
are the “only ones to estimate mortality in Iraq using
scientific methods.”
How can
science be done by surveys and is cluster sampling nonsense?
Surveys are at the heart of epidemiological studies in which
prevalence information (how often a disease or trait – or death–
occurs) is not available through centralized sources. One of the
most widely cited surveys in the US is the National Health and
Nutrition Examination Survey which estimates a variety of
information, from how many Americans have Diabetes to who uses
pesticides. This is carried out under the auspices of the
National Center for Health, which is in turn under the CDC.
While, in theory, some of this information is available through
other sources – doctors, for example, could report how many of
their patients are treated for diabetes – there is no way of
centrally recording the information and making sure that
everyone with diabetes is actually counted. As a consequence,
statistics have been developed to solve this problem.
Cluster sampling is a well-established in statistics, and is
routinely used to estimate casualties in natural disasters or
war zones. For the Iraq study the researchers randomly chose
people to interview about deaths in their families, interviewed
a cluster of households around them, and then extrapolated the
results to the whole population. There is nothing controversial
in the method itself, though people can certainly question
whether the sampling was done correctly.
As with all surveying, the result is still an estimate, not
an exact number. That’s just because a sample of the population
was interviewed instead of every person. Thus, the authors of
the Lancet study didn’t find 650,000 dead people – they found
some 547 deaths after talking to about 12,800 people and
extrapolated to how many they would have found had they talked
to 27 million. They compared this to how many would have died at
previous mortality rates before March 2003. The estimate is only
as good as the sample population approximates the whole
population. But the more people you survey, the more accurate
the estimate.
Thus, 650,000 deaths is only an estimate; the range of
possible deaths is actually 392,979 to 942,636. What this means
is that we can be 95 percent certain that the number of excess
deaths is in this range, but our best estimate is 654,965.
You can think of this as a bell curve, centered 654,965 where
the curve is highest. The other values in the range are less
likely than to be the “true value” though not as much less
likely as a number outside the range.
How
good is the science in this particular study?
There has been a wealth of material on the web
attacking the Lancet study. Most of it is devoid of science, and
ranges from outrage at the numbers (it’s impossible to believe
it could be so high), to accusations of bias based on the
authors’ views of U.S. foreign policy. Interested parties such
as the Iraqi government responded quickly by calling the numbers
“inflated” and “far from the truth”, rather than putting forward
any real reasons why these numbers are unlikely to have
occurred. The
Washington Post reported that the Defense Department’s
response was that coalition forces “takes enormous precautions,”
and suggested that the deaths are the “result of insurgent
activity”.
In statistics error does not mean “mistake” – it is, rather,
a measurement of how certain we can be of the results. In the
Lancet study, and studies of a similar kind, there are two types
of possible error: one coming from built-in bias and one coming
from the use of statistics itself. While bias can hardly ever be
teased out if it is intrinsic to the study, there are many
techniques to minimize the error due to chance. The Lancet
authors took care to interview enough families (about 1800
households) so that the possibility that they randomly chose
families more affected by violence than others would be small
enough not to affect their overall message. That message is
essentially that other estimates of deaths due to the war are
off by an order of magnitude.
The error intrinsic to statistics is often a target of
criticism: if there’s error no matter what we do, how can we
know anything? That line of reasoning makes about as much as
sense as saying “since I’m not going to get exactly half heads
and half tails if I flip a coin, I can’t say anything at all
about whether a coin is biased.” Of course we can: we can
calculate the likelihood that flipping a coin will be
heads or tails. We can even calculate that the likelihood of
getting all heads or all tails when flipping a coin ten times is
less than one in 500. This leads us to the conclusion: if
someone happens to flip a coin ten times and gets all heads, the
coin is probably biased.
Since a survey does not actually interview everyone, it is
possible that, purely by chance, the sample does not represent
the whole population. For example, in conducting a poll between
two candidates who are actually neck-and-neck, a pollster could,
inadvertently, interview only Democrats. The survey would then
get the result that the public is hugely in favor of one of the
candidates and not the other – contrary to what the population
actually feels. However, the chance of this happening is
practicallyzero if there are enough people surveyed. If there
are only ten people surveyed, it wouldn’t be so surprising if
they were all Democrats. But if 1,000 randomly chosen people are
interviewed, it is practically impossible to end up with all
Democrats.
In the same way, it is theoretically possible for
the scientists in the Lancet study to have interviewed 1,800
households that just happened to be wracked by violence, while
the rest of the country was not. Or it could happen that the
specific regions randomly chosen by the scientists were more
heavily affected by violence than the rest of the country. The
main point here is that these scenarios are extremely
unlikely to occur, even though no one can rule out that
possibility.
The error coming from the use of statistics is found in the
confidence interval. In the case of the Iraqi deaths
study, the confidence interval for the number of excess deaths
is 392,979 to 942,636 people. What this means is that, if the
survey were conducted again, we could be 95 percent confident
that the excess deaths would fall in this range again.
The most likely number of excess deaths is 654,965.
In terms of probabilities, it means that re-doing the interviews
would result in a number that is much more likely to be near
this figure than it is to be near 400,000 or near 900,000. We
can be very confident that the number of deaths is extremely
unlikely to have been less than 392,000 (less than 2.5 percent
chance). For those who question the very technique of sampling,
Cervantes -- a medical and health sociologist -- explains
how the methods are standard fare for those doing this kind of
research, as does any basic text on how to conduct polls.
Does anyone
disagree with the study based on scientific principles?
At
The Questionable Authority, blogger Mike Dunford points out
some possible bias that might have led the researchers to
numbers higher than they should be. First, he argues that the
Lancet study used population estimates obtained by
a joint Iraqi/UN population study, rather than those of the
Iraqi Ministry of Health, which the same authors had used two
years earlier. Dunford points out that if the total population
(estimated to be approximately 27 million people) is invalid,
then so is the estimate of 650,000. This is certainly true, but
there is no reason to suspect that these organizations would be
biased towards reporting a larger population than thereactually
is. Dunford seems to imply that there are vying estimates out
there, but he only cites information from 18 months earlier. If
Dunford is correct that the population has been overestimated by
as much as 11 percent, then the excess deaths should actually be
estimated at about 580,000 instead of 650,000.
Dunford also points out that the excess deaths attributed to
nonviolent causes was not statistically significant, and that,
therefore, they should not be included in the total. Here, this
is simply a question of standard statistical protocol. The main
purpose of the study is to measure excess deaths, without regard
to cause. For this, the nonviolent causes are relevant, even if
not statistically significant by themselves. The authors did
find that the increase in violent deaths was (highly)
statistically significant, which is why they are reported
separately. Thus it would be difficult to argue from this study
that Iraq’s infrastructure is falling apart and that people
dying from a lack of hospitals. But the authors have not made
such claims in their paper.
Flares into Darkness argues that the sampling method would
invariably favor densely populated areas, and that these areas
would have disproportionate levels of bombs. It is certainly
true that densely populated areas are more likely to be sampled
– but only proportional to their population. In other words, if
ten times as many people live in Region A than live in rural
Region B, then Region A is ten times as likely to be chosen as a
sampling destination. Overall, this will not have the effect of
oversampling cities; it will have the effect of sampling cities
proportional to their population, and rural areas proportional
to theirs. Flares into Darkness insists that the room that these
scientists had to change who they interviewed based on perceived
threats gave them just enough leeway to cheat and pick places
with more deaths. But this accusation is tantamount to their
fixing the data; it simply doesn’t address the core findings of
the study.
Flares into Darkness also claims that overall rates of death
could be affected by the fact that deaths with specific causes
could be correlated: a car bomb, for example, could kill several
people at once in neighboring houses. If the sample happened to
take in a neighborhood that took a bad hit from car bombs, then
it could lead to an incorrect extrapolation to the whole
population, when, the researchers just happened to sample a
badly-hit area.
Yet again, it is standard statistical protocol in a cluster
sampling survey to take this into account. The authors adjust
for the fact that there is higher correlation within clusters
than across clusters. As the authors point out in their analysis
section, “The SE (standard error) for mortality rates were
calculated with robust variance estimation that took into
account the correlation between rates of death within the same
cluster over time.”
While the authors did consider the issue of correlated
deaths, it should also be noted that even if the authors did not
correctly account for these correlations, the affect would be to
widen the confidence interval, not lower the estimate. For just
as correlated deaths could mean that what the observers saw was
a fluke, it could also mean that the observers didn’t see the
truly bad parts.
The last criticism that has spread widely in the blogosphere
is that the pre-war mortality rates were underestimated. Since
this study used prewar death rates to estimate how many deaths
would have occurred anyway – and subtracted these off to obtain
the “excess deaths,” a lower pre-war death rate makes for a
higher estimate of excess deaths. There is little compelling
reason to believe that the prewar death rates were
underestimated, as they were corroborated by the the study
itself.
The study found a prewar death rate of 5.5 per 1000 people
per year, which was roughly the same as that found by the CIA
and the U.S. Census bureau, according to Gilbert Burnham, one of
the authors of the study from John Hopkins University. In other
words, the prewar death rate was not just “invented” or taken
from an unreliable source; it was supported by data from the
same interviews.
What should
we not take from this study?
The Lancet study does an excellent job in counting the dead, but
its purpose does not lie in pointing fingers. While the study
reports that 31 percent of excess deaths were caused by
coalition forces, it is possible that those reporting the crimes
might be biased by anit-coalition sentiment. Those families may
be more likely to believe and report that a violent death was
attributable to coalition forces. Of course, bias could go the
other direction as well – we simply do not know. We also cannot
assess who died – civilians or those involved with the armed
conflict. Again, it would be easy to see how bias would affect
reports by family members.
The methods used by this study are the only scientific
methods we have for discovering death rates in war torn
countries without the infrastructure to report all deaths
through central means. Instead of dismissing over half a million
dead people as a
political ploy as did Anthony Cordesman of the Center for
Strategic & International Studies in Washington, we ought to
embrace science as opening our eyes to a tragedy whose death
scale has been vastly underestimated until now.
Click on "comments" below to read or post comments
Comment Guidelines
Be succinct, constructive and relevant to the story. We encourage engaging, diverse and meaningful commentary. Do not include personal information such as names, addresses, phone numbers and emails. Comments falling outside our guidelines – those including personal attacks and profanity – are not permitted.
See our complete Comment Policy and use this link to notify us if you have concerns about a comment. We’ll promptly review and remove any inappropriate postings.