Freakonomics Revised and Expanded Edition Read online

Page 25


  There is one thing I would take Bennett to task for: first saying that he doesn’t believe our abortion-crime hypothesis but then revealing that he does believe it with his comments about black babies. You can’t have it both ways.

  As an aside, the caller’s initial hypothesis is completely wrong. If abortion were illegal, our Social Security problems would not be solved. As noted above, most abortions just shift a child from being born today to a child being born to the same mother a few years later.

  —SDL (Sept. 30, 2005)

  “Back to the Drawing Board for Our Latest Critics”

  Thanks to recent articles in the Wall Street Journal and the Economist, a working paper by Chris Foote and Chris Goetz that is sharply critical of John Donohue and me has gotten an enormous amount of attention.

  In that working paper, Foote and Goetz criticized the analysis underlying one of the tables in our original article that suggested a link between legalized abortion and crime. (It is worth remembering that the approach they criticize was one of four distinct pieces of evidence we presented in that paper; they offer no criticisms of the other three approaches.)

  Foote and Goetz made two basic changes to the original analysis we did. First, they correctly noted that the text of our article stated that we had included state-year interactions in our regression specifications, when indeed the table that got published did not include these state-year interactions. Second, they correctly argue that without controlling for changes in cohort size, the original analysis we performed provided a test of whether cohorts exposed to high rates of legalized abortion did less crime, but did not directly afford a test of whether “unwantedness” was one of the channels through which this crime reduction operated. (Note: we didn’t claim that this particular analysis was a direct test of the “unwantedness” hypothesis. This last section of the paper was the most speculative analysis of everything that we did, and frankly we were surprised it worked at all, given the great demands it put on the data.) They found that once you made those changes, the results in our original Table 7 essentially disappear.

  There is, however, a fundamental problem with the Foote and Goetz analysis. The abortion data that are available are likely to be quite noisy. As one adds more and more control variables (e.g., nearly 1,000 individual state-year interactions), the meaningful variation in abortion rates gets eaten away. The signal-to-noise ratio in what remains of the variation in measured abortions gets worse and worse. That will lead the measured impact of abortions on crime to dwindle. Because this work uses a state/year/single year of age (e.g., 19-year-olds in Ohio in 1994) as the unit of analysis, the analyses performed are highly saturated with interactions: state-age interactions, age-year interactions, and state-year interactions. Together, these interactions account for more than 99 percent of the variance in arrest rates and more than 96 percent of the variation in the abortion proxy. It is an exercise that is very demanding of the data.

  In light of this, it seems uncontroversial that one would want to do the best one could in measuring abortion when carrying out such an exercise.

  The abortion measure used by Foote and Goetz is one that is produced by the Alan Guttmacher Institute. The Alan Guttmacher Institute makes estimates based on surveys of abortion providers of the number of abortions performed per live birth in each state and year.

  To proxy for the abortion exposure of, say, 19-year-olds arrested in California in 1993, Foote and Goetz use the abortion rate in California in 1973. This is not an unreasonable first approximation (and indeed is the one we used in most parts of our original paper because it is simple and transparent), but it is just an approximation for a number of reasons:

  There is a great deal of cross-state mobility. Therefore, many of the 19-year-olds arrested in California in 1993 were not born in California. They were born in other states, or possibly other countries. Indeed, I believe that recent figures suggest that more than 30 percent of those in their late teens do not reside in the state in which they were born.

  Using a date of 20 years earlier to proxy for the abortion exposure of a 19-year-old induces an enormous amount of noise. If I am a 19-year-old sometime in 1993, I may have been born as early as Jan. 2, 1973 (that would make me still 19 on Jan. 1, 1993) or as late as Dec. 31, 1974 (that would have me turning 19 on Dec. 31, 1993). Abortions occur sometime in advance of birthdays, typically about 13 weeks into a pregnancy. So the relevant date (roughly) of when those who are 19 in 1993 would have been exposed to legalized abortion is about six months before they were born, or July 2, 1972, through June 30, 1974. While that window overlaps with the year 1973 (which is what Foote and Goetz use as their time period of abortion exposure), note that it also includes half of 1972 and half of 1974!

  A non-trivial fraction of abortions performed in the United States, especially in the time when legalization was taking place, involved women crossing state lines to get an abortion. As a consequence, measuring abortions in terms of the state in which the abortion is performed (that’s what the Foote/Goetz data does), rather than the state of residence of the woman getting the abortion, induces further measurement error into their abortion proxy.

  The Alan Guttmacher abortion numbers are, even by the admission of the people who collect the data, far from perfect. Indeed, the correlation between these abortion estimates and another time series collected by the CDC is well below one, suggesting that even if problems 1, 2, and 3 didn’t exist, there would be substantial measurement error. The correlation between the Alan Guttmacher measure and the CDC measure, not surprisingly, gets lower and lower the more control variables that are included. This is exactly what one would expect if the controls are taking the signal out of the abortion measures and leaving behind mostly noise.

  What John Donohue and I have done (with fantastic research assistance from Ethan Lieber) is to attempt to address as best we can these four problems with the abortion measure that Foote and Goetz are using. In particular, we do the following:

  As we describe in our original paper on abortion, one can deal with cross-state mobility by using the decennial censuses to determine the state of birth for the current residents of a state. (The results from carrying out this correction in our crime regressions are reported in Table 5 of the original 1999 paper.) This is possible to do because the census micro data reports the state of birth and current state of residence for a 5 percent sample of the U.S. population. Note that the correction we are able to make is unlikely to be perfect, so it may not fully solve the problem, but it clearly moves us in the right direction.

  Given that the window of abortion exposure faced by 19-year-olds in 1993 spans the years 1972 to 1974, the obvious solution to this problem is to allow abortions performed in 1972, 1973, and 1974 to influence arrests of 19-year-olds in 1993. It is straightforward to work out roughly the weights that one wants to put on the different years’ abortion rates—or one can do it non-parametrically and let the data decide; the answers are virtually identical.

  In order to deal with the fact that many women were crossing state lines to get abortions in the 1970s, we use the Guttmacher Institute’s estimates of abortions performed on women residing in a state relative to live births in that state. (We were unaware of the existence of these better data when we wrote the initial paper, otherwise we would have used them at that time.) There is little question that measuring abortions by state of residence is superior to measuring them by where the procedure is performed.

  The standard solution to measurement error is to perform an instrumental-variables analysis in which you use one noisy proxy of the phenomenon that is poorly measured as an instrument for another noisy proxy. (I recognize that most readers of this blog will not understand what I mean by this.) In this setting, the CDC’s independently generated measure of legalized abortions is likely to be an excellent instrument. Because there is so much noise in each of the measures, the standard errors increase when doing this I.V. procedure, but under a standard set of assumptions, the estim
ates obtained will be purged of the attentuation bias that will be present due to measurement error.

  I think that just about any empirical economist would tend to believe that each of these four corrections we make to the abortion measure will lead us closer to capturing the true impact of legalized abortion on crime. So the question becomes: What happens when we replicate the specifications reported in Foote and Goetz, but with this improved abortion proxy?

  The results are summarized in the following table, which has two panels. The top panel shows the results for violent crime. The bottom panel corresponds to property crime.

  Starting with the first panel, the top row reports the same specifications as Foote and Goetz (I don’t bother showing their estimates excluding state-age interactions because it makes no sense to exclude these and they themselves say that their preferred specifications include state-age interactions). We are able to replicate their results. As can be seen, the coefficients shrink as one adds state-year interactions and population controls.

  The second row of the table presents the coefficients one obtains with our more thoughtfully constructed abortion measure (changes 1–3 on pages 259–60 having been made to their abortion measure). With a better measure of abortion, as expected, all the estimated abortion impacts increase across the board. The results are now statistically significant in all of the Foote and Goetz specifications. Even in the final, most demanding specification, the magnitude of the coefficient is about the same as in the original results we published that didn’t control for state-year interactions or population. The only difference between what Foote and Goetz did and what we report in row 2 is that we have done a better job of really measuring abortion. Everything else is identical.

  The third row of the table reports the results of instrumental variables estimates using the CDC abortion measure as an instrument for our (more thoughtfully constructed) Guttmacher Institute proxy of abortions. The results all get a little bigger but are more imprecisely estimated.

  The bottom panel of the table shows results for property crime. Moving from Foote and Goetz’s abortion measure in the top row to our more careful one in the second row (leaving everything else the same), the coefficients become more negative in three of the four specifications. Doing the instrumental variables estimation has a bigger impact on property crime than on violent crime. All four of the instrumental variables estimates of legalized abortion on property crime are negative (although again less precisely estimated).

  The simple fact is that when you do a better job of measuring abortion, the results get much stronger. This is exactly what you expect of a theory that is true: doing empirical work closer to the theory should yield better results than empirical work more loosely reflecting the theory. The estimates without population controls, but including state-year interactions, are as big or bigger than what is in our original paper. As would be expected (since the unwantedness channel is not the only channel through which abortion is acting to reduce crime), the coefficients we obtain shrink when we include population controls. But, especially for violent crime, a large impact of abortion persists even when one measures arrests per capita.

  The results we show in this new table are consistent with the impact of abortion on crime that we find in the three other types of analyses we presented in the original paper using different sources of variation. These results are consistent with the unwantedness hypothesis.

  No doubt there will be future research that attempts to overturn our evidence on legalized abortion. Perhaps they will even succeed. But this one does not.

  —SDL (Dec. 5, 2005)

  3. WHAT DO THE KANSAS CITY ROYALS HAVE IN COMMON WITH AN iPOD?

  One useful purpose of the Freakonomics blog (of any blog, really) is to make random reflections on random subjects—including, as it turns out, the subject of randomness itself.

  “What Do the K.C. Royals and My iPod Have in Common?” On the surface, not much. The Royals have lost 19 straight games and are threatening to break the all-time record for futility in major-league baseball. My iPod, on the other hand, has quickly become one of my most beloved material possessions.

  So what do they have in common? They both can teach us a lesson about randomness.

  The human mind does badly with randomness. If you ask the typical person to generate a series of “heads” and “tails” to mimic a random sequence of coin tosses, the series doesn’t really look like a randomly generated sequence at all. You can try it yourself. First, before you read further, write down what you expect a random series of 20 coin tosses to look like. Then spend 15 or 20 minutes flipping coins (or use a random number generator in Excel). If you are like the typical person, the “random” sequence you generated will have many fewer long streaks of “all heads” or “all tails” than actually arise in real life.

  My iPod shuffle reminds me of this every time I use it. I’m consistently surprised at how often it plays two, three, or even four songs by the same artist, even though I have songs by dozens of different artists on it. On a number of occasions, I’ve even become mistakenly convinced I don’t have the iPod on shuffle, but rather I’m playing all the songs by one artist. If someone is really bored, maybe they can repeatedly have the iPod shuffle the songs, record the data, and see if the shuffle function really is random. My guess is that it is, because what would be the point of Apple doing something different? I have a friend Tim Groseclose, a professor of political science at UCLA, who was convinced that the random button on his CD player knew which songs were his favorites and disproportionally played those. So I bet him one day, made him name his favorite songs in advance, and won lunch.

  Which brings us to the Kansas City Royals. It seems like, when a team loses 19 games, that is so extreme that it can’t reasonably be the result of randomness. Clearly coaches, sportswriters, and most fans believe that to be true. How often have you heard of a coach holding a closed-door meeting to try to turn a team around? But if you look at it statistically, you expect 19-game losing streaks to occur, simply by randomness, about as often as they do.

  The following calculations are admittedly crude, but they give you the basic idea. Each year, there are about two teams in the major leagues that have a winning percentage of around 35. (Sometimes no team is that bad, and in other years there are real stinkers like Detroit in 2003—they won only 26.5 percent of their games.) For a team that has a 35 percent chance of winning each game, the chance of losing its next 19 games is about one in 4,000. Each team plays 162 games a year and so has 162 chances to start such a streak. (They count streaks that begin in one year and end in the next year, so it is correct to use all 162 games.)

  So each year, for these two bad teams that win 35 percent of their games, there are a total of 324 chances to have a 19-game losing streak. It takes about 12 or 13 years for these two bad teams to have a total of 4,000 chances for a 19-game losing streak. Thus we would expect a losing streak this long a little less than once a decade.

  In practice, we see, if anything, slightly fewer long losing streaks than expected based on these calculations. The last really long losing streak was that of the Cubs in 1996–1997, which was 16 games. (There is actually a good reason that long streaks occur a little less than in the simple model I was using. It is because a team that wins 35 percent of their games doesn’t have the same likelihood of winning every game: sometimes it has a 50 percent chance and sometimes a 20 percent chance; that sort of variability lessens the likelihood of long streaks.)

  So, one doesn’t need to resort to explanations like “lack of concentration,” being “snakebit,” or “demoralization” to explain why the Royals are losing so many games in a row. It’s just that they are a bad team getting some bad luck.

  —SDL (Aug. 20, 2005)

  “Wikipedia? Feh!”

  I know, I know, I know: Wikipedia is one of the wonders of the online world. But if anyone ever needs a reason to be deeply skeptical of Wikipedia’s dependability, I urge you to click on the e
ntry for “List of Economists,” which is introduced thusly: “This is an alphabetical list of well-known economists. Economists are scholars conducting research in the field of economics.”

  It is true that the list includes George Akerlof and Paul Samuelson and Jeffrey Sachs and even Steve Levitt. But if you want to see how truly pathetic Wikipedia can be, check out the sixth “economist” listed under “D”: that’s right, yours truly. Although some of my best friends are economists, I am very much not. (Note: soon after I posted this entry, a reader was helpful/mischievous enough to quickly amend the Wikipedia entry, deleting my name.) The point is that the greatest strength of Wikipedia is also its greatest weakness: pretty much anyone can contribute anything anytime to an “encyclopedia” that most casual users will assume is in fact encyclopedic, but which in fact changes regularly, depending on the input of its users. For instance:

  Freakonomics, we make a passing reference to the Chicago Black Sox, the name given to the Chicago White Sox after eight players were found to have colluded with gamblers to throw the 1919 World Series.

  A reader recently wrote: The 1919 white sox were not known as the black sox because they threw the world weries [sic]. They were called that because their owner (whose name i do not have) was too stingy to have their uniforms cleaned regularly so that they frequently showed up on the diamond in dirty uniforms. You’re welcome.