About aneconomicsense

Economist

The Percentage Increase in the US Death Rate in 2020 Was Higher than in the Worst Year of the Spanish Flu

The Covid-19 pandemic this year has often been described as the worst public health crisis in the US since the Spanish Flu of 1918.  And there has certainly been nothing since then that compares.  But early estimates indicate that the increase in deaths this year may have even been worse than during the worst year of the Spanish Flu.

Population has of course grown, so the relevant figures to compare over time are death rates – the share of the population that died each year.  In addition, while it should not make much of a difference to the annual percentage changes, for longer-term trends one should look at the age-adjusted figures, which control for changes in the age structure of the population (as older people are more likely to die in any given year).

The National Center for Health Statistics, part of the CDC, provides such figures for the years 1900 to 2018.  The CDC also has separate figures for deaths and population for recent years, up to 2019, and hence the crude (not age-adjusted) death rate can be calculated for 2018 and 2019.  From this one can estimate what the percentage change in the death rate was for 2019, which should be close to the percentage change in the age-adjusted rate as the population age structure changes only slowly over time.

Finally, the estimate for what deaths in 2020 will be was obtained from a December 22 Associated Press article reporting on recent CDC figures.  It reports that the US “is on track to see more than 3.2 million deaths this year”.  And this may be a conservative estimate.  It would imply an increase of 345,162 over the number of deaths in 2019.  But as I write this, the number of deaths reported by Johns Hopkins from Covid-19 alone was 345,271.  While a share of those who have died due to Covid-19 this year would have died later in the year from other causes, analyses of “excess deaths” have consistently found that the increase in deaths this year has been well in excess of the number that has been attributed specifically to Covid-19.  For example, a CDC study in October estimated that for the period from late January (when Covid-19 was first recognized in the US) to October 3, there were 299,028 more who had died than one would have expected under normal conditions, while (over that period) only 198,081 deaths were directly attributed to Covid-19.  That is, the increase in the number of deaths during the period (over what would have been expected had this been a normal year) was 51% higher than the number of recorded deaths due to Covid-19.

We will not have the final figures for 2020 for some time.  But even accepting the conservative estimate of 3.2 million deaths in the US in 2020, the percentage increase in the number of deaths in the US per 100,000 of population was about the same as (and indeed very slightly higher than) in 1918, the year of the Spanish Flu.  In both years, the increase in the death rate was close to 12%.

I was surprised that the increase in the death rate this year was anywhere close to what it was during the Spanish Flu.  It is truly astounding.  But it is especially surprising given that, as the AP article cites, the number of deaths of Americans in 1918 rose by 46%.  How can that be consistent with the 12% increase in the death rates?  There are several factors that account for this.

To start, the increase (using another CDC table) was from 981,239 deaths in 1917 to 1,430,079 in 1918, or an increase of 448,840 (or 45.7%).  But note that a footnote to that CDC table indicates that these figures include military deaths in World War I.  This differs from the practice in later years, where military deaths are excluded.   World War I military deaths totalled 116,516 (according to official Department of Veterans Affairs figures), and almost all came in 2018.  While some share of these were due to soldiers dying from the Spanish Flu, I do not have figures for what those might have been.  Leaving that aside, the increase in non-military deaths in 1918 would have been close to 34%, and adjusting for US population growth in those years the increase in the death rate would have been 32%.

That is still significantly higher than the close to 12% increase in the death rate in the CDC figures.  There are two reasons for this.  First, note, as was discussed above, these CDC figures adjust for changes in the age structure of the US population over the century, and use population weights of the year 2000.  The US age structure has shifted markedly since 1918, with a far higher share of the population now in the older age groups.  Had the age structure in 1917 and 1918 been the same as it was in 2000, the base overall death rates in those years, excluding the impact of the Spanish Flu, would have been higher.  Second, the Spanish Flu was especially lethal for those in the middle age groups – those in their 20s, 30s, and 40s.  The young and the old were less affected.  But those in the middle age groups account for a smaller share of the age structure using the weights of the year 2000 than they had in 1918.  While I do not have the data to allow me to decompose the specific numbers, simple simulations with plausible parameters suggest that at the year 2000 population shares, the increase in the age-adjusted death rate of 12% can be consistent with the 32% increase observed in 1918 when military deaths are excluded, or the 46% overall increase when military deaths and population growth are included.

Few expected when Covid-19 was first detected in the US that the death rate this year would be anywhere close to what it was during the Spanish Flu.  But it has been.  Sadly, the crisis was severely exacerbated by the singularly incompetent management of the Trump administration.  The Washington Post had a particularly good summary of the many things the Trump administration got wrong in addressing Covid-19 this year in a December 26 editorial.  Or one can compare the US record to that of other countries.  The US this year had 1,040 deaths due to Covid-19 per million of population.  Canada had 412 per million, or 40% of the US rate.  If the US had simply matched what its neighbor to the north was able to do, the US would have had 137,000 deaths instead of more than 345,000, or 208,000 fewer deaths.  And others have done even far better.  New Zealand has had only 5.0 deaths per million, and Taiwan just 0.3 per million.

Better management would have been possible.  But the Trump administration failed, and hundreds of thousands of Americans have died.  As the Post editorial noted:

“It was always going to be hard. But the worst did not have to happen. It happened because Mr. Trump failed to respect science, meet the virus head-on and be honest with the American people.”

The Pattern of Unemployment: Fewer on Temporary Layoff, but More of the Rest

A.  Introduction

The economic downturn this year has been unprecedented in many ways.  Millions were laid off in March and April as the country desperately went into lockdowns to limit the spread of the virus that causes Covid-19, following the failure of the Trump administration to recognize the extent of the crisis.  But it was always known that those lockdowns would be temporary (albeit with differing views on how long they would be needed), and hence those laid off in March and April were generally put on temporary layoff.

The number on temporary layoff then started to decline in May, with this continuing (although at a diminishing rate) through November.  This has brought down the headline figure on total unemployment – the figure most people focus on – from 14.7% in April to 6.7% as of November.  But while that focus on the overall rate of unemployment is normally appropriate (as the number on temporary layoff has usually been steady and low, while the labor force has fluctuated little), the unusual conditions of the downturn this year have masked important aspects of the story.  Unemployment is a good deal worse than the traditional measures appear to suggest.

One key issue is what happened to those who were unemployed but not on temporary layoff.  The Bureau of Labor Statistics (the source of the data used here) defines those on temporary layoff to be those who are unemployed but who either have been given a date for when they will be able to return to their job, or expect to return to it within six months.  All other unemployed (defined by the BLS as being in the labor force but not employed, not on temporary layoff, and have taken concrete actions within the previous four weeks to look for a job), include those who were permanently laid off, who completed some temporary job, who left a job by choice (quit), or have newly entered (or re-entered) the labor force actively seeking a job but do not yet have a job.

That distinction – treating separately the unemployed on temporary layoff and the rest – will be examined in this post.  Also important to the story is how many are counted in the official statistics to be in the labor force at all, as that has also changed in this unprecedented downturn.  That will be examined as well.

B.  The Unemployed on Temporary Layoff Spiked Up and Then Came Back Down, but Other Unemployed Rose Steadily

The chart at the top of this post shows the unemployment rates (as a percent of the labor force) for all who were unemployed (in black), for those on temporary layoff (in blue), and for all others who were unemployed (in red).  Unemployment surged, at an unprecedented rate, in March and April of this year.  The increase in those on temporary layoff accounted for this – indeed for all of this in those months in the estimated figures.  The total increase in unemployment in March and April compared to February was 17.25 million; the increase in those on temporary layoff was almost exactly the same at 17.26 million.  (But keep in mind that these figures are estimates based on household surveys, and thus that there will be statistical noise.  That the numbers were almost exactly the same was certainly in part a coincidence.  Still, they were definitely close.)

The total unemployment rate then came down sharply from its April peak of 14.7% to 6.7% as of November.  It was led, once again. by changes in those on temporary layoff, but this time the number unemployed for reasons other than temporary layoff rose.  Their rate was 3.0% in February, which then rose to 5.0% by September.  It has kept at roughly this rate since (although so far with data for only two more months).

That increase – of 2.0% points – is significant but modest.  With all the disruption this year, one might have expected to see more.  Certainly important and effective in partially alleviating the crisis was the $3.1 trillion in several packages approved by Congress in March and April (of new government spending, tax cuts, and new loan facilities).  While adding to the public debt, such spending is needed when confronted with a crisis such as this.  The time to reduce the fiscal deficit would have been when the economy was at full employment.  But Trump added to the fiscal deficit in those years (with both higher spending and massive tax cuts) instead of using that opportunity to prepare for when a crisis would necessitate higher spending.

C.  But the Number in the Labor Force Also Fell, Which Had a Significant Impact on the Reported Unemployment Rates

There is, however, another factor important to the understanding of why the unemployment rate (for those other than on temporary layoff) rose only by this modest amount.  And that is that the number in the labor force abruptly changed.  This was another unusual development in this unprecedented crisis.

The labor force (formally the civilian labor force, as those on active military duty are excluded) changes only slowly.  It is driven primarily by demographic factors, coupled with long-term decisions such as when to retire, whether to attend college rather than seek a job, whether both spouses in a married couple will seek to work or whether one (usually in this society the wife) will choose to remain at home with the children, and so on.

But it was different in this crisis:

The number in the labor force fell abruptly in March and April – by 8.1 million compared to February, or 4.9% of the labor force.  There has never before been such an abrupt fall, at least since 1948 when such data first began to be collected.  The largest previous two-month fall was just 1.0 million, in 1953 when this was 1.6% of the labor force.  (And the month to month “squiggles” seen in the chart above should not be taken too seriously.  They likely reflect statistical noise in the household surveys.)

Those who drop out of the labor force are not counted as unemployed, as formally defined by the BLS, as they are not actively seeking a job.  And the sharp collapse in available jobs in March and April probably contributed to some dropping out of the labor force, as that scarcity of jobs would, by itself, induce some not even to try to find a job if they lost one.  But probably more important in this unprecedented crisis is a parent (and usually the wife) dropping out of the labor force in order to take care of their children when the schools and/or daycare centers closed.  This has never happened before.

Since April, the number in the labor force has recovered some but only partially.  Compared to what the labor force likely would have been by November 2020, based on a simple extrapolation of the January 2015 to January 2020 trend (growth at an annual rate of 0.95%), the labor force in November was 5.4 million less than what it otherwise would have been.

This will have a significant impact on the unemployment figures.  Since the number unemployed are, by definition, equal to the difference between the number in the labor force less the number employed, the number unemployed will be substantially higher if one counts those who abruptly dropped out of the labor force to take care of their children.  These, including others who dropped out of the labor force but would prefer to be employed if labor market conditions were more hospitable, should be counted when assessing how much slack there may be in the economy.  And they can be considered as part of those who are unemployed for reasons other than temporary layoff (as they are similar in nature to those who had, or in this case would have, re-entered the labor force but do not have a job).

Counting such individuals as among those who are in fact unemployed, the labor market does not look to be nearly as strong as the headline figures would suggest.  Assuming that the labor force in 2020 would have continued to grow at the trend rate of the previous several years, that the number employed would have been the same as was recorded, and that the number on temporary layoff would have also been as recorded, the chart on unemployment rates then becomes:

Superficially, this chart may appear similar to that at the top of this post.  But there are two important differences.  First, note the scale is different.  Instead of peaking in April at an overall unemployment rate of 14.7%, the unemployment rate would instead have reached over 19%.  Furthermore, it would still be at 9.7% as of November, which is high.  It is not far from the peak 10.0% rate reached in 2009 following the 2008 economic collapse.

Second, both the path and the levels of the unemployment rate for those other than on temporary layoff are now quite different.  That rate jumps abruptly in March and April to 8.2% of the labor force, from 3.1% before, and then remains at around 7 1/2 to 8% since then.  This a much more worrisome level than was seen above when no correction was made for what has happened to the labor force this year.  There is also no downward trend.  All the gains in the reduction of overall employment since April would have been due to the reduction in those on temporary layoff.

D.  Conclusion

The economy remains weak.  And president-elect Joe Biden is certainly correct that a necessary (although not sufficient) condition for the economy to recover fully will be that Covid-19 be addressed.  Australia, New Zealand, and the countries of East Asia have shown that this can be done, and how it could have been done.  Simply wearing masks would have been central.  Dr. Robert Redfield, the head of the CDC, has noted that wearing a mask could very well be more effective in stopping the spread of the virus that causes Covid-19 than some of the vaccines now under development, if everyone wore them.  But Trump has been unwilling to call on all Americans, including in particular his supporters, to wear a mask.  Indeed, he has even repeatedly mocked those who choose to wear a mask.

As a longer-term solution, however, vaccinations will be key.  But this also depends on most Americans (probably a minimum of 70 to 80%, but at this point still uncertain) being vaccinated.  Even under the most optimistic of circumstances, constraints on vaccine availability alone means this will not be possible before the summer.  But this also assumes that, once available, 70 to 80% of the population (or whatever the minimum share required will be) will choose to be vaccinated.  Given how the simple wearing of face masks was politicized by Trump (and turned into a signal of whether one supports him or not), plus controversies among some on both the left and the right on vaccinations that pre-dates Trump’s presidency, it is hard to be optimistic that such a vaccination share will soon be reached.

Hopefully a sufficiently large share of the population will at some point have chosen to be vaccinated to end the spread of the virus.  But until that happens, further support to the economy, and not least relief to those most affected by the crisis, needs to be passed by Congress and signed by the president.  The House passed such a measure already last May, but Mitch McConnell, the Republican Majority Leader in the Senate, has so far blocked consideration of anything similar.  As I write this, there appears to be a possibility of some compromise being considered in the Senate, but it remains to be seen if that will happen (and if Trump then will sign it).

It is certainly desperately needed.

Was Sturgis a Covid-19 Superspreader Event?: Evidence Suggests That It May Well Have Been

A.  Introduction

The Sturgis Motorcycle Rally is an annual 10-day event for motorcycle enthusiasts (in particular of Harley-Davidsons), held in the normally small town in far western South Dakota of Sturgis.  It was held again this year, from August 7 to August 16, despite the Covid-19 pandemic, and drew an estimated 460,000 participants.  Motorcyclists gather from around the country for lots of riding, lots of music, and lots of beer and partying.  And then they go home.  Cell phone data indicate that fully 61% of all the counties in the US were visited by someone who attended Sturgis this year.

Due to the pandemic, the town debated whether to host the event this year.  But after some discussion, it was decided to go ahead.  And it is not clear that town officials could have stopped it even if they wanted.  Riders would likely have shown up anyway.

Despite the on-going covid pandemic, masks were rarely seen.  Indeed, many of those attending were proud in their defiance of the standard health guidelines that masks should be worn and social distancing respected, and especially so in such crowded events.  T-shirts were sold, for example, declaring “Screw Covid-19, I Went to Sturgis”.

Did Sturgis lead to a surge in Covid-19 cases?  Unfortunately, we do not have direct data on this because the identification of the possible sources of someone’s Covid-19 infection is incredibly poor in the US.  There is little investigation of where someone might have picked up the virus, and far from adequate contact tracing.  And indeed, even those who attended the rally and later came down with Covid-19 found that their state health officials were often not terribly interested in whether they had been at Sturgis.  The systems were simply not set up to incorporate this.  And those attending who were later sick with the disease were also not always open on where they had been, given the stigma.

One is therefore left only with anecdotal cases and indirect evidence.  Recent articles in the Washington Post and the New York Times were good reports, but could only cover a number of specific, anecdotal, cases, as well as describe the party environment at Sturgis.  One can, however, examine indirect evidence.  It is reasonable to assume that those motorcycle enthusiasts who had a shorter distance to get to Sturgis from their homes would be more likely to go.  Hence near-by states would account for a higher share (adjusted for population) of those attending Sturgis and then returning home than would be the case for states farther away.  If so, then if Covid-19 was indeed spread among those attending Sturgis, one would see a greater degree of seeding of the virus that causes Covid-19 in the near-by states than would be the case among states that are farther away.  And those near-by states would then have more of a subsequent rise in Covid-19 cases as the infectious disease spread from person to person than one would see in states further away.

This post will examine this, starting with the chart at the top of this post.  As is clear in that chart, by early November states geographically closer to Sturgis had far higher cases of Covid-19 (as a share of their population) than those further away.  And the incidence fell steadily with geographic distance, in a relationship that is astonishingly tight.  Simply knowing the distance of the state from Sturgis would allow for a very good prediction (relative to the national average) of the number of daily new confirmed cases of Covid-19 (per 100,000 of population) in the 7-day period ending November 6.

A first question to ask is whether this pattern developed only after Sturgis.  If it had been there all along, including before the rally was held, then one cannot attribute it to the rally.  But we will see below that there was no such relationship in early August, before the rally, and that it then developed progressively in the months following.  This is what one would expect if the virus had been seeded by those returning from Sturgis, who then may have given this infectious disease to their friends and loved ones, to their co-workers, to the clerks at the supermarkets, and so on, and then each of these similarly spreading it on to others in an exponentially increasing number of cases.

To keep things simple in the charts, we will present them in a standard linear form.  But one may have noticed in the chart above that the line in black (the linear regression line) that provides the best fit (in a statistical sense) for a straight line to the scatter of points, does not work that well at the two extremes.  The points at the extremes (for very short distances and very long ones) are generally above the curve, while the points are often below in the middle range.  This is the pattern one would expect when what matters to the decision to ride to the rally is not some increment for a given distance (of an extra 100 miles, say), but rather for a given percentage increase (an extra 10%, say).  In such cases, a logarithmic curve rather than a straight (linear) line will fit the data better, and we will see below that indeed it does here.  And this will be useful in some statistical regression analysis that will examine possible explanations for the pattern.

It should be kept in mind, however, that what is being examined here are correlations, and being correlations one can not say with certainty that the cause was necessarily the Sturgis rally.  And we obviously cannot run this experiment over repeatedly in a lab, under varying conditions, to see whether the result would always follow.

Might there be some other explanation?  Certainly there could be.   Probably the most obvious alternative is that the surge in Covid-19 cases in the upper mid-west of the US between September and early November might have been due to the onset of cold weather, where the states close to Sturgis are among the first to turn cold as winter approaches in the US.  We will examine this below.  There is, indeed, a correlation, but also a number of counter-examples (with states that also turned colder, such as Maine and Vermont, that did not see such a surge in cases).  The statistical fit is also not nearly as good.

One can also examine what happened across the border in the neighboring provinces of Canada.  The weather there also turned colder in September and October, and indeed by more than in the upper mid-west of the US.  Yet the incidence of Covid-19 cases in those provinces was far less.

What would explain this?  The answer is that it is not cold weather per se that leads to the virus being spread, but rather cold weather in situations where socially responsible behavior is not being followed – most importantly mask-wearing, but also social distancing, avoidance of indoor settings conducive to the spread of the virus, and so on.  As examined in the previous post on this blog, mask-wearing is extremely powerful in limiting the spread of the virus that causes Covid-19.  But if many do not wear masks, for whatever reason, the virus will spread.  And this will be especially so as the weather turns colder and people spend more time indoors with others.

This could lead to the results seen if states that are geographically closer to Sturgis also have populations that are less likely to wear masks when they go out in public.  And we will see that this was likely indeed a factor.  For whatever reason (likely political, as the near-by states are states with high shares of Trump supporters), states geographically close to Sturgis have a generally lower share of their populations regularly wearing masks in this pandemic.  But the combination of low mask-wearing and falling temperatures (what statisticians call an interaction effect) was supplemental to, and not a replacement of, the impact of distance from Sturgis.  The distance factor remained highly significant and strong, including when controlling for October temperatures and mask-wearing, consistent with the view that Sturgis acted as a seeding event.

This post will take up each of these topics in turn.

B.  Distance to Sturgis vs. Daily New Cases of Covid-19 in the Week Ending November 6

The chart at the top of this post plots the average daily number of confirmed new cases of Covid-19 over the 7-day period ending November 6 in a state (per 100,000 of population), against the distance to Sturgis.  The data for the number of new cases each day was obtained from USAFacts, which in turn obtained the data from state health authorities.  The data on distance to Sturgis was obtained from the directions feature on Google Maps, with Sturgis being the destination and the trip origin being each of the 48 states in the mainland US (Hawaii and Alaska were excluded), plus Washington, DC.  Each state was simply entered (rather than a particular address within a state), and Google Maps then defaulted to a central location in each state.  The distance chosen was then for the route recommended by Google, in miles and on the roads recommended.  That is, these are trip miles and not miles “as the crow flies”.

When this is done, with a regular linear scale used for the mileage on the recommended routes, one obtains the chart at the top of this post.  For the week ending November 6, those states closest to Sturgis saw the highest rates of Covid-19 new cases (130 per 100,000 of population in South Dakota itself, where Sturgis is in the far western part of the state, and 200 per 100,000 in North Dakota, where one should note that Sturgis is closer to some of the main population centers of North Dakota than it is to some of the main population centers of South Dakota).  And as one goes further away geographically, the average daily number of new cases falls substantially, to only around one-tenth as much in several of the states on the Atlantic.

The model is a simple one:  The further away a state is from Sturgis, the lower its rate (per 100,000 of population) of Covid-19 new cases in the first week of November.  But it fits extremely well even though it looks at only one possible factor (distance to Sturgis).  The straight black line in the chart is the linear regression line that best fits, statistically, the scatter of points.  A statistical measure of the fit is called the R-squared, which varies between 0% and 100% and measures what share of the variation observed in the variable shown on the vertical axis of the chart (the daily new cases of Covid-19) can be predicted simply by knowing the regression line and the variable shown on the horizontal axis (the miles to Sturgis).

The R-squared for the regression line calculated for this chart was surprisingly high, at 60%.  This is astonishing.  It says that if all we knew was this regression line, then we could have predicted 60% of the variation in Covid-19 cases across states in the week ending November 6 simply by knowing how far the states are from Sturgis.  States differ in numerous ways that will affect the incidence of Covid-19 cases in their territory.  Yet here, if we know just the distance to Sturgis, we can predict 60% of how Covid-19 incidence will vary across the states.  Regressions such as these are called cross-section regressions (the data here are across states), and such R-squares are rarely higher than 20%, or at most perhaps 30%.

But as was discussed above in the introduction, trip decisions involving distances often work better (fit the data better) when the scale used is logarithmic.  On a logarithmic scale, what enters into the decision to make the trip of not is not some fixed increment of distance (e.g. an extra 100 miles) but rather some proportional change (e.g. an extra 10%).  A statistical regression can then be estimated using the logarithms of the distances, and when this estimated line is re-calculated back on to the standard linear scale, one will have the curve shown in blue in the chart:

The logarithmic (or log) regression line (in blue) fits the data even better than the simple linear regression line (in black), including at the two extremes (very short and very long distances).  And the R-squared rises to 71% from the already quite high 60% of the linear regression line.  The only significant outlier is North Dakota.  If one excludes North Dakota, the R-squared rises to 77%.  These are remarkably high for a cross-section analysis.

This simple model therefore fits the data well, indeed extremely well.  But there are still several issues to consider, starting with whether there was a similar pattern across the states before the Sturgis rally.

C.  Distance to Sturgis vs. Daily New Cases of Covid-19 in the Week Ending August 6, and the Progression in Subsequent Months

The Sturgis rally began on August 7.  Was there possibly a similar pattern as that found above in Covid-19 cases before the rally?  The answer is a clear no:

In the week ending August 6, the relationship of Covid-19 cases to distance from Sturgis was about as close to random as one can ever find.  If anything, the incidences of Covid-19 cases in the 10 or so states closest to Sturgis were relatively low.  And for all 48 states of the Continental US (plus Washington, DC), the simple linear regression line is close to flat, with an R-squared of just 0.4%.  This is basically nothing, and is in sharp contrast to the R-squared for the week ending November 6 of 60% (and 71% in logarithmic terms).

One should also note the magnitudes on the vertical scale here.  They range from 0 to 40 cases (per 100,000 of population) per day in the 7-day period.  In the chart for cases in the 7-day period ending on November 6 (as at the top of this post), the scale goes from 0 to 200.  That is, the incidence of Covid-19 cases was relatively low across US states in August (relative to what it was later in parts of the US).  That then changed in the subsequent months.  Furthermore, one can see in the charts above for the week ending November 6 that the states further than around 1,400 miles from Sturgis still had Covid new case rates of 40 per day or less.  That is, the case incidence rates remained in that 0 to 40 range between August and early November for the states far from Sturgis.  The states where the rates rose above this were all closer to Sturgis.

There was also a steady progression in the case rates in the months from August to November, focused on the states closer to Sturgis, as can be seen in the following chart:

Each line is the linear regression line found by regressing the number of Covid-19 cases in each state (per 100,000 of population) for the week ending August 6, the week ending September 6, the week ending October 6, and the week ending November 6, against the geographic distance to Sturgis.  The regression lines for the week ending August 6 and the week ending November 6 are the same as discussed already in the respective charts above.  The September and October ones are new.

As noted before, the August 6 line is essentially flat.  That is, the distance to Sturgis made no difference to the number of cases, and they are also all relatively low.  But then the line starts to twist upwards, with the right end (for the states furthest from Sturgis) more or less fixed and staying low, while the left end rotated upwards.  The rotation is relatively modest for the week ending September 6, is more substantial in the month later for the week ending October 6, and then the largest in the month after that for the week ending November 6.  This is precisely the path one would expect to find with an exponential spread of an infectious disease that has been seeded but then not brought under effective control.

D.  Might Falling Temperatures Account for the Pattern?

The charts above are consistent with Sturgis acting as a seeding event that later then led to increases in Covid-19 cases that were especially high in near-by states.  But one needs to recognize that these are just correlations, and by themselves cannot prove that Sturgis was the cause.  There might be some alternative explanation.

One obvious alternative would be that the sharp increase in cases in the upper mid-west of the US in this period was due to falling temperatures, as the northern hemisphere winter approached.  These areas generally grow colder earlier than in other parts of the US.  And if one plots the state-wide average temperatures in October (as reported by NOAA) against the average number of Covid-19 cases per day in the week ending November 6 one indeed finds:

There is a clear downward trend:  States with lower average temperatures in October had more cases (per 100,000 of population) in the week ending November 6.  The relationship is not nearly as tight as that found for the one based on geographic distance from Sturgis (the R-squared is 35% here, versus 60% for the linear relationship based on distance), but 35% is still respectable for a cross-state regression such as this.

However, there are some counterexamples.  The average October temperatures in Maine and Vermont were colder than all but 7 or 10 states (for Maine and Vermont, respectively), yet their Covid-19 case rates were the two lowest in the country.

More telling, one can compare the rates in North and South Dakota (with the two highest Covid-19 rates in the country in the week ending November 6) plus Montana (adjacent and also high) with the rates seen in the Canadian provinces immediately to their north:

The rates are not even close.  The Canadian rates were all far below those in the US states to their south.  The rate in North Dakota was fully 30 times higher than the rate in Saskatchewan, the Canadian province just to its north.  There is clearly something more than just temperature involved.

E.  The Impact of Wearing Masks, and Its Interaction With Temperature

That something is the actions followed by the state or provincial populations to limit the spread of the virus.  The most important is the wearing of masks, which has proven to be highly effective in limiting the spread of this infectious disease, in particular when complemented with other socially responsible behaviors such as social distancing, avoiding large crowds (especially where many do not wear masks), washing hands, and so on.  Canadians have been far more serious in following such practices than many Americans.  The result has been far fewer cases of Covid-19 (as a share of the population) in Canada than in the US, and far fewer deaths.

Mask wearing matters, and could be an alternative explanation for why states closer to Sturgis saw higher rates of Covid-19 cases.  If a relatively low share of the populations in the states closer to Sturgis wear masks, then this may account for the higher incidence of Covid-19 cases in those near-by states.  That is, perhaps the states that are geographically closer to Sturgis just happen also to be states where a relatively low share of their populations wear masks, with this then possibly accounting for the higher incidence of cases in those states.

However, mask-wearing (or the lack of it), by itself, would be unlikely to fully account for the pattern seen here.  Two things should be noted.  First, while states that are geographically closer to Sturgis do indeed see a lower share of their population generally wearing masks when out in public, the relationship to this geography is not as strong as the other relationships we have examined:

The data in the chart for the share who wear masks by state come from the COVIDCast project at Carnegie Mellon University, and was discussed in the previous post on this blog.  The relationship found is indeed a positive one (states geographically further from Sturgis generally have a higher share of their populations wearing masks), but there is a good deal of dispersion in the figures and the R-squared is only 27.5%.  This, by itself, is unlikely to explain the Covid-19 rates across states in early November.

Second, and more importantly:  While the states closer to Sturgis generally have a lower share of mask-wearing, this would not explain why one did not see similarly higher rates of Covid-19 incidence in those states in August.  Mask-wearing was likely similar.  The question is why did Covid-19 incidence rise in those states between August (following the Sturgis rally) and November, and not simply why they were high in those states in November.

However, mask-wearing may well have been a factor.  But rather than accounting for the pattern all by itself, it may have had an indirect effect.  With the onset of colder weather, more time would be spent with others indoors, and wearing a mask when in public is particularly important in such settings.  That is, it is the combination of both a low share of the population wearing masks and the onset of colder weather which is important, not just one or the other.

These are called interaction effects, and investigating them requires more than can be depicted in simple charts.  Multiple regression analysis (regression analysis with several variables – not just one as in the charts above) can allow for this.  Since it is a bit technical, I have relegated a more detailed discussion of these results to a Technical Annex at the conclusion of this post for those who are interested.

Briefly, a regression was estimated that includes miles from Sturgis, average October temperatures, the share who wear masks when out in public, plus an interaction effect between the share wearing masks and October temperatures, all as independent variables affecting the observed Covid-19 case rates of the week ending November 6.  And this regression works quite well.  The R-squared is 75.4%, and each of the variables (including the interaction term) are either highly significant (miles from Sturgis) or marginally so (a confidence level of between 6 and 8% for the variables, which is slightly worse than the 5% confidence level commonly used, but not by much).

Note in particular that the interaction term matters, and matters even while each of the other variables (miles to Sturgis, October temperatures, and mask-wearing) are taken into account individually as well.  In the interaction term, it is not simply the October temperatures or the share wearing masks that matter, but the two acting together.  That is, the impact of relatively low temperatures in October will matter more in those states where mask-wearing is low than they would in states where mask-wearing is high.  If people generally wore masks when out in public (and followed also the other socially responsible behaviors that go along with it), the falling temperatures would not matter as much.  But when they don’t, the falling temperatures matter more.

From this overall regression equation, one can also use the coefficients found to estimate what the impact would be of small changes in each of the variables.  These are called elasticities, and based on the estimated equation (and computing the changes around the sample means for each of the variables):  a 1% reduction in the number of miles from Sturgis would lead to a 1.0% rise in the incidence of Covid-19 cases; a 1% reduction (not a 1 percentage point increase, but rather a 1% reduction from the sample mean) in the share of the population wearing masks when out in public would lead to a 1.7% rise in the incidence of Covid-19 cases; and a 1% reduction in the average October temperature across the different states would lead to a 1.2% rise in the incidence of Covid-19 cases.  All of these elasticity estimates look quite plausible.

These results are consistent with an explanation where the Sturgis rally acted as a significant superspreader event that led to increased seeding of the virus in the locales, in near-by states especially. This then led to significant increases in the incidence of Covid-19 cases in the different states as this infectious disease spread to friends and family and others in the subsequent months, and again especially in the states closest to Sturgis.  Those increases were highest in the states that grew colder earlier than others when the populations wearing masks regularly in those states was relatively low.  That is, the interaction of the two mattered.  But even with this effect controlled for, along with controlling also for the impact of colder temperatures and for the impact of mask-wearing, the impact of miles to Sturgis remained and was highly significant statistically.

F.  Conclusion

As noted above, the analysis here cannot and does not prove that the Sturgis rally acted as a superspreader event.  There was only one Sturgis rally this year, one cannot run repeated experiments of such a rally under various alternative conditions, and the evidence we have are simply correlations of various kinds.  It is possible that there may be some alternative explanation for why Covid-19 cases started to rise sharply in the weeks after the rally in the states closest to Sturgis.  It is also possible it is all just a coincidence.

But the evidence is consistent with what researchers have already found on how the virus that causes Covid-19 is spread.  Studies have found that as few as 10% of those infected may account for 80% of those subsequently infected with the virus.  And it is not just the biology of the disease and how a person reacts to it, but also whether the individual is then in situations with the right conditions to spread it on to others.  These might be as small as family gatherings, or as large as big rallies.  When large numbers of participants are involved, such events have been labeled superspreader events.

Among the most important of conditions that matter is whether most or all of those attending are wearing masks.  It also matters how close people are to each other, whether they are cheering, shouting, or singing, and whether the event is indoors or outdoors.  And the likelihood that an attendee who is infectious might be there increases exponentially with the number of attendees, so the size of the gathering very much matters.

A number of recent White House events matched these conditions, and a significant number of attendees soon after tested positive for Covid-19.  In particular, about 150 attended the celebration on September 26 announcing that Amy Coney Barrett would be nominated to the Supreme Court to take the seat of the recently deceased Ruth Bader Ginsburg.  Few wore masks, and at least 18 attendees later tested positive for the virus.  And about 200 attended an election night gathering at the White House.  At least 6 of those attending later tested positive.  While one can never say for sure where someone may have contracted the virus, such clusters among those attending such events are very unlikely unless the event was where they got the virus.  It is also likely that these figures are undercounts, as White House staff have been told not to let it become publicly known if they come down with the virus.  Finally, as of November 13 at least 30 uniformed Secret Service officers, responsible for security at the White House, have tested positive for the coronavirus in the preceding few weeks.

There is also increasing evidence that the Trump campaign rallies of recent months led to subsequent increases in Covid-19 cases in the local areas where they were held.  These ranged from studies of individual rallies (such as 23 specific cases traced to three Trump rallies in Minnesota in September), to a relatively simple analysis that looked at the correlation between where Trump campaign rallies were held and subsequent increases in Covid-19 cases in that locale, to a rigorous academic study that examined the impact of 18 Trump campaign rallies on the local spread of Covid-19.  This academic study was prepared by four members of the Department of Economics at Stanford (including the current department chair, Professor B. Douglas Bernheim).  They concluded that the 18 Trump rallies led to an estimated extra 30,000 Covid-19 cases in the US, and 700 additional deaths.

One should expect that the Sturgis rally would act as even more of a superspreader event than those campaign rallies.  An estimated 460,000 motorcyclists attended the Sturgis rally, while the campaign rallies involved at most a few thousand at each.  Those at the Sturgis rally could also attend for up to ten days; the campaign rallies lasted only a few hours.  Finally, there would be a good deal of mixing of attendees at the multiple parties and other events at Sturgis.  At a campaign rally, in contrast, people would sit or stand at one location only, and hence only be exposed to those in their immediate vicinity.

The results are also consistent with a rigorous academic study of the more immediate impact of the Sturgis rally on the spread of Covid-19, by Professor Joseph Sabia of San Diego State University and three co-authors.  Using anonymous cell phone tracking data, they found that counties across the US that received the highest inflows of returning participants from the Sturgis rally saw, in the immediate weeks following the rally (up to September 2), an increase of 7.0 to 12.5% in the number of Covid-19 cases relative to the counties that did not contribute inflows.  But their study (issued as a working paper in September) looked only at the impact in the immediate few weeks following Sturgis.  They did not consider what such seeding might then have led to.  The results examined in the analysis here, which is longer-term (up to November 6), are consistent with their findings.

It is therefore fully plausible that the Sturgis rally acted as a superspreader event.  And the evidence examined in this post supports such a conclusion.  While one cannot prove this in a scientific sense, as noted above, the likelihood looks high.

Finally, as I finish writing this, the number of deaths in the US from this terrible virus has just surpassed 250,000.  The number of confirmed cases has reached 11.6 million, with this figure rising by 1 million in just the past week.  A tremendous surge is underway, far surpassing the initial wave in March and April (when the country was slow to discover how serious the spread was, due in part to the botched development in the US of testing for the virus), and far surpassing also the second, and larger, wave in June and July (when a number of states, in particular in the South and Southwest, re-opened too early and without adequate measures, such as mask mandates, to keep the disease under control).  Daily new Covid-19 cases are now close to 2 1/2 times what they were at their peak in July.

This map, published by the New York Times (and updated several times a day) shows how bad this has become.  It is also revealing that the worst parts of the country (the states with the highest number of cases per 100,000 of population) are precisely the states geographically closest to Sturgis.  There is certainly more behind this than just the Sturgis rally.  But it is highly likely the Sturgis rally was a significant contributor.  And it is extremely important if more cases are to be averted to understand and recognize the possible role of events such as the rally at Sturgis.

Average Daily Cases of Covid-19 per 100,000 Population

7-Day Average for Week Ending November 18, 2020

Source:  The New York Times, “Covid in the US:  Latest Map and Case Count”.  Image from November 19, with data as of 8:14 am.

 


Technical Annex:  Regression Results

As discussed in the text, a series of regressions were estimated to explore the relationship between the Sturgis rally and the incidence of Covid-19 cases (the 7-day average of confirmed new cases in the week ending November 6) across the states of the mainland US plus Washington, DC.  Five will be reported here, with regressions on the incidence of Covid-19 cases (as the dependent variable) as a function of various combinations of three independent variables: miles from Sturgis (in terms of their natural logarithms), the average state-wide temperature in October (also in terms of their natural logarithms), and the share of the population in the respective states who reported they always or most of the time wore masks when out in public.  Three of the five regressions are on each of the three independent variables individually, one on the three together, and one on the three together along with an interaction effect measured by multiplying the October temperature variable (in logs) with the share wearing masks.  The sources for each variable were discussed above in the main text.

The basic results, with each regression by column, are summarized in the following table:

Regressions on State Covid-9 Cases – November 6

     Miles to Sturgis and Temperatures are in natural logs

Miles only

Temp only

Masks only

Miles, Temp, &Masks

All with Interaction

Miles to Sturgis

Slope

-54.9

-41.9

-36.6

t-statistic

-10.7

-5.2

-4.3

Avg Temperature

Slope

-133.3

-45.5

-516.8

t-statistic

-5.5

-2.0

-1.9

Share Wear Masks

Slope

-3.1

-0.8

-22.4

t-statistic

-3.9

-1.3

-1.8

Interaction Temp & Masks

Slope

5.44

t-statistic

1.8

Intercept

425.5

572.5

309.4

582.5

2,422.5

t-statistic

11.9

6.0

4.5

7.1

2.3

R-squared

71.0%

39.4%

24.2%

73.7%

75.4%

In the regressions with each independent variable taken individually, all the coefficients (slopes) found are highly significant.  The general rule of thumb is that a confidence level of 5% is adequate to call the relationship statistically “significant” (i.e. that the estimated coefficient would not differ from zero just due to random variation in the data).  A t-statistic of 2.0 or higher, in a large sample, would signal significance at least at a 5% confidence level (that is, that the estimated coefficient differs from zero at least 95% of the time), and the t-statistics are each well in excess of 2.0 in each of the single-variable regressions.  The R-squared is quite high, at 71.0%, for the regression on miles from Sturgis, but more modest in the other two (39.4% and 24.2% for October temperature and mask-wearing, respectively).

The estimated coefficients (slopes) are also all negative.  That is, the incidence of Covid-19 goes down with additional miles from Sturgis, with higher October temperatures, and with higher mask-wearing.  The actual coefficients themselves should not be compared to each other for their relative magnitudes.  Their size will depend on the units used for the individual measures (e.g. miles for distance, rather than feet or kilometers; or temperature measured on the Fahrenheit scale rather than Centigrade; or shares expressed as, say, 80 for 80% instead of 0.80).  The units chosen will not matter.  Rather, what is of interest is how the predicted incidence of Covid-19 changes when there is, say, a 1% change in any of the independent variables.  These are elasticities and will be discussed below.

In the fourth regression equation (the fourth column), where the three independent variables are all included, the statistical significance of the mask-wearing variable drops to a t-statistic of just 1.3.  The significance of the temperature variable also falls to 2.0, which is at the borderline for the general rule of thumb of 5% confidence level for statistical significance.  The miles from Sturgis variable remains highly significant (its t-statistic also fell, but remains extremely high).  If one stopped here, it would appear that what matters is distance from Sturgis (consistent with Sturgis acting as a seeding event), coupled with October temperatures falling (so that the thus seeded virus spread fastest where temperatures had fallen the most).

But as was discussed above in the main text, there is good reason to view the temperature variable acting not solely by itself, but in an interaction with whether masks are generally worn or not.  This is tested in the fifth regression, where the three individual variables are included along with an interaction term between temperatures and mask-wearing.  The temperature, mask-wearing, and interaction variables now all have a similar level of significance, although at just less than 5% (at 6% to 8% for each).  While not quite 5%, keep in mind that the 5% is just a rule of thumb.  Note also that the positive sign on the interaction term (the 5.44) is an indication of curvature.  The positive sign, coupled with the negative signs for the temperature and mask-wearing variables taken alone, indicates that the curves are concave facing upwards (the effects of temperature and mask-wearing diminish at the margin at higher values for the variables).  Finally, the miles to Sturgis variable remains highly significant.

Based on this fifth regression equation, with the interaction term allowed for, what would be the estimated response of Covid-19 cases to changes in any of the independent variables (miles to Sturgis, October temperatures, and mask-wearing)?  These are normally presented as elasticities, with the predicted percentage change in Covid-19 cases when one assumes a small (1%) change in any of the independent variables.  In a mixed equation such as this, where some terms are linear and some logarithmic (plus an interaction term), the resulting percentage change can vary depending on the starting point is chosen.  The conventional starting point taken is normally the sample means, and that will be done here.

Also, I have expressed the elasticities here in terms of a 1% decrease in each of the independent variables (since our interest is in what might lead to higher rates of Covid-19 incidence):

Elasticities from Full Equation with Interaction Term

      Percent Increase in Number of Covid-19 Cases from a 1% Decrease Around Sample Means

Elasticity

Miles to Sturgis

1.02%

October Temperature

1.16%

Share Wearing Masks

1.69%

All these estimated elasticities are quite plausible.  If one is 1% closer in geographic distance to Sturgis (starting at the sample mean, and with the other two variables of October temperature and mask-wearing also at their respective sample means), the incidence of Covid-19 cases (per 100,000 of population) as of the week ending November 6 would increase by an estimated 1.02%.  A 1% lower October temperature (from the sample mean) would lead to an estimated 1.16% increase in Covid-19 cases.  And the impact of the share wearing masks is important and stronger, where a 1% reduction in the share wearing masks would lead to an estimated 1.69% increase in cases, with all the other factors here taken into account and controlled for.

These results are consistent with a conclusion that the Sturgis rally led to a significant seeding of cases, especially in near-by states, with the number of infections then growing over time as the disease spread.  The cases grew faster in those states where mask-wearing was relatively low, and in states with lower temperatures in October (leading people to spend more time indoors).  When the falling temperatures were coupled with a lower share (than elsewhere) of the population wearing masks, the rate of Covid-19 cases rose especially fast.

More Evidence on the Effectiveness of Masks in Limiting the Spread of Covid-19

A.  Introduction

States where a high share of the population normally wear face masks when out in public also have a significantly lower transmission of the virus that causes Covid-19.  The chart above shows the relationship between the wearing of face masks and the prevalence of Covid-19 in the community (measured in ways that will be discussed below).  It is remarkable how tight that relationship is, as well as how steep.  Wearing masks has a large effect.  States differ between each other in dozens of different ways that can significantly affect the transmission of Covid-19.  Yet the share of the population who report that they wear face masks most or all of the time when they go out in public can explain by itself most of the variation in the prevalence of Covid-19 across the states.

The data also show a remarkably strong consistency between the share of the population in a state that wear masks and whether that state voted for Clinton or Trump in 2016.  That there is such a relationship is not surprising.  Bur what is surprising is that the relationship is close to perfect.  All but one of the states that voted for Clinton in 2016 report a mask-wearing share of 88% or above.  The one exception is Colorado, with a share of 87.4%.  And every single Trump-voting state has a reported share that is below 88%.  Furthermore, several of the states where the vote margin was close (and where current polling indicates Biden would receive the most votes) are on the borderline.  Such states include Pennsylvania, Michigan, and Wisconsin, each with a share between 87 and 88%.

This post will explain where this data comes from, the statistical significance of the relationships, and how one can appropriately interpret the results – for the chart above and two more below.  And I should note that the idea for a chart similar to that above, using this data set, came from an article by the Washington Post reporter Christopher Ingraham that appeared on October 23 at the Washington Post website.  The analysis here extends what Ingraham had.

B.  A Higher Share of People Wearing Masks is Associated With A Lower Incidence of Covid-19 in the Community

The chart at the top of this post shows a remarkably tight relationship between the share of the population who say they normally or always wear a mask when out in public, and the prevalence of Covid-19 in those states (or more precisely, the share of the population who are personally aware of someone in the local community with Covid-19 like symptoms – this will be discussed below).  With a higher share wearing masks, the prevalence is lower.  There are qualifiers that need to be considered on the source of the data and how one should interpret the apparent relationship, but that there is such an association is clear.

The data underlying the analysis comes from a new set assembled as part of the COVIDcast project at Carnegie Mellon University.  With the onset of the Covid-19 crisis, this group at Carnegie Mellon designed a simple survey that participants could sign on to via Facebook, to provide data on the spread of Covid-19.  While the questionnaire has evolved over time, the most recent version (that they call Wave 4) was launched on September 8, and includes questions on mask usage.  What makes the survey particularly interesting is that they receive a huge number of responses daily (averaging over 40,000 per day from September 8 to October 7).  This allows for a statistically significant sample at not just the state level (which I focus on here), but also for most counties in the US.

There are, of course, potential biases in such a sample that must be corrected for.  Those using Facebook, and in particular those willing to participate in such a survey seen via Facebook, will not necessarily be representative of the population.  But the Carnegie-Mellon analysts use various methods, including adjusting for the demographic characteristics of the respondents, to correct for this.  It cannot be perfect, but is likely to be reasonable.

One should also recognize that the behavior respondents record and what they actually do (such as on mask usage) may differ.  Respondents may exaggerate the consistency with which they in fact use masks.  But the Carnegie Mellon researchers have compared their results with that found from other sources, and have concluded they are consistent.  Furthermore, if there is a bias, one might expect that bias to be similar across states.  Perhaps all the responses (on, say, mask usage) are biased upwards – we may all say that we use masks more frequently than we in fact do.  But if that bias is similar (on average) across all of us, then the variation across states would remain.  They would just all be shifted upwards.  Still, one should remain cognizant that the findings are based on self-reported responses, and may be biased.

The Wave 4 questionnaire had questions on a variety of topics.  The specific question on mask usage was whether, in the past five days, the respondent had worn a mask when in public:  all of the time, most of the time, some of the time, a little of the time, or none of the time.  A mask wearer was classified as one who said that they wore a mask all or most of the time.

For whether the respondent might have Covid-19, the questionnaire asked whether they or someone in their immediate household suffer from Covid-like symptoms – specifically, whether they have a fever of 100℉ or more plus at least one of several additional possible conditions (sore throat, cough, shortness of breath, or difficulty breathing).  Thus, while they also ask later whether the person has had a formal test for Covid-19 (they may or may not have), the response reported here is for whether they have Covid-like symptoms.  Similarly, the figure for the share reporting possible cases of Covid-19 in the community (as in the chart at the top of this post), is based on whether the respondent was aware of others in their local community – who they know personally – who are suffering from Covid-19 like symptoms (with the conditions as defined for the individual).

The survey was designed this way in part as a purpose was to see whether such self-reported conditions could help local health authorities determine whether Covid-19 might be spreading in their communities, and to know this even before testing might find it.  And the results were encouraging.  The Carnegie Mellon researchers found that the daily and highly localized monitoring that was possible with the extremely large sample size of their survey generally performed well in tracking what was later found, via confirmed tests, on the spread of Covid-19 in that locality.

The resulting relationship between the respondents reporting that they wore masks when out in public all or most of the time (in the past five days), and the share reporting that they were personally aware of people in their community exhibiting Covid-19 like symptoms, is what is plotted (in terms of state averages) in the chart above.  To smooth out possible day to day statistical noise in the data (and also to be consistent with 7-day averages for reported confirmed cases of Covid-19, to be discussed below), the data shown in the chart is for the 7-day average covering October 15 to October 21 (the most recent days available when I downloaded this).

The straight line in black in the chart is the ordinary least squares regression line – the line that best fits the scatter of observations.  And from this one can calculate the statistical measure commonly referred to as the R-squared, which can vary between 0 and 1 (or 0% to 100%).  The R-squared indicates what share of the variation in the scatter of observations would be predicted by simply knowing where this straight regression line passes.  If the scatter points are all close to that line, the R-squared will be high.  In the limit, if they all lie precisely at that line, the R-squared will equal 1.  At the other extreme, if the scatter is all over and basically random, then the R-squared will be close to 0.

R-squared values are normally low for what are termed cross-section analyses (such as this, i.e. across the different states).  There are numerous reasons states differ from each other, and just knowing one factor (in this case the share who wear masks) will normally produce only a loose correlation with the result of interest (in this case the share reporting they are personally aware of people with Covid-19 like symptoms in the community).  Economists and other analysts would normally be happy to find a R-squared of 20% or so in such cross-state analyses, and elated if it is 30%.

In the chart here, the R-squared was 66%.  This is remarkable.  It indicates that if all one knows is the share of those wearing masks, we could predict 66% of the variation in the share reporting that they are aware of Covid-19 like symptoms in the community.  Despite the many reasons why states may differ in their incidence of Covid-19, this one factor (the share of those wearing masks) will by itself predict two-thirds of the variation.  Furthermore, one state (Wyoming) is an outlier.  If one runs the regression over the full sample but with this single case removed, the R-squared rises to an astonishing 76%.

There are further reasons to be surprised that such a strong statistical relationship comes through.  One is that the data come from a survey.  Poor (possibly misunderstood) responses, or lack of knowledge on whether others in the community are suffering from Covid-19 like symptoms (due, perhaps, to not knowing many in the community, or not being in touch with them) will normally add statistical noise.   But it appears that the extremely large sample sizes here have offset that.  We still see a clear and strong relationship.

One should also recognize that states in the US are not isolated from each other.  There is a substantial amount of travel from one to the other.  Thus even if mask-wearing is common in one state, with infection rates then low, there may be a continual “re-seeding” of the infection brought in by travelers from states that are not as conscientious in wearing masks.  This would weaken the relationship between local mask-wearing and local infection rates.  Yet despite this, we still see a strong and highly significant effect.

One must also always note that what is being examined is a correlation between two variables, and that correlation does not necessarily indicate causation.  One must examine whether it may in each individual analysis.  In the case here, however, one can readily see a mechanism where a higher share of the population wearing masks will lead to a lower share of the population in the community being infected with the virus that causes Covid-19.  But what would be the mechanism where a higher incidence of Covid-19 in the community would affect the share wearing masks?  There might well be such a causal relationship, but one would then expect it to act in precisely the opposite way to the relationship found in the data:  When a high share of the local community is infected with Covid-19, one would expect a high share of the population then to wear masks.  It would be rational to be extra careful.  But the relationship seen in the data is the opposite:  The data show that a high share of the community being infected is associated with a low share of the population wearing masks.  The line slopes downwards.  It is reasonable to conclude that the causation goes from the wearing of masks to the share infected, not the reverse.

There is, however, a factor in the statistical analysis which may well be quite important.  The data here show a high degree of correlation (negative correlation, as the line slopes downwards) between the wearing of masks and the incidence of Covid-19 in the locality.  But the data on the wearing of masks may itself be, and indeed likely will be, highly correlated with other actions that may be taken to limit the spread of Covid-19.  Responsible individuals who wear masks likely also are careful to practice social distancing, to wear gloves when shopping, to avoid crowded bars and nightclubs, and to avoid crowded events where many of the attendees do not wear masks (such as Trump rallies).  Thus it may not simply be the wearing of masks that explains why a high share of the local population wearing masks in an area is correlated with a more limited spread of Covid-19:  It is may well be the whole set of socially responsible behaviors that matter.

This is true and should be recognized.  While the direct measure here is the share of the population that mostly or always wear masks, such behavior likely goes together with a full set of socially responsible behaviors that together lead to a lower spread of Covid-19.  While we will often refer to the wearing of masks as the factor that is associated with a limited spread of Covid-19, we should recognize that the wearing of masks likely goes together with a broader set of behaviors that together are important.

C.  A Higher Share of People Wearing Masks is Associated With A Lower Incidence of Self-Reported Cases of Covid-19, and a Lower Official Count of Confirmed Cases of Covid-19 

Two other charts are of interest.  The first examines the association between the share reporting they mostly or always wear masks, and whether they (or someone in their household) is exhibiting the symptoms of Covid-19:

One again sees a strong (negative) association between the wearing of masks and cases of those with symptoms consistent with Covid-19 (in this case of the survey respondents themselves).  And the R-squared measures of the degree of correlation are even higher:  70% for the full sample, and 78% if the single case of Wyoming is removed.  This again suggests that the wearing of masks (along with other responsible behaviors such as social distancing, etc.) is associated with a more limited spread of Covid-19.  Furthermore, the impact is not simply statistically significant, but also large.  Based just on the values on the regression line, a state with a reported 69% who wear masks (such as South Dakota) compared to a state (or locale) with a reported 97% who wear masks (such as Washington, DC) would be expected to have more than 6.1 times the share of cases.  (The actual South Dakota vs. DC ratio is even higher, at over 7, as South Dakota is above the regression line and DC a bit below).

The findings are also consistent with the official counts of new confirmed cases of Covid-19 per 100,000 of population:

The data on the official counts were downloaded from the COVIDcast site, but they in turn were obtained from compilations at USAFacts.  And USAFacts obtained the figures from state public health agencies.

The relationship between those reporting that they wear masks most or all of the time, and the number of confirmed new cases by state (per 100,000 of population, and a seven-day average covering the October 15 to October 21 week), remains significant, negative, and strong.  The states where mask-wearing is a higher share of the population routinely wear masks (as reported in the surveys) see a significantly lower incidence of confirmed new cases of Covid-19.  The statistical relationship is not as strong as before (the R-squared is 47%), but this is not surprising.  The average number of daily new confirmed cases over the 7-day period (October 15 to 21) counts only those with a test result, for a new case, reported over those seven days.  The number of people who are sick with Covid-19 will include not just those newly-tested individuals, but also others who have been sick for some time plus individuals with Covid-19 like symptoms who may have the disease but have not (or not yet) been tested.  It is not surprising that the correlation of mask-wearing with just a slice of the population who are sick with Covid-19 will be weaker.  But the R-squared of 47% is still quite high.

D.  Conclusion:  The Effectiveness of Wearing Masks

Masks work by reducing the transmission of an infectious disease to and from others.  They are not perfect.  But neither do they need to be perfect, as one can see from the simple arithmetic of the spread of an infectious disease.

Infectious diseases are viruses, which cannot survive on their own but can only survive by spreading from person to person.  Any individual will have a disease such as Covid-19 for a finite period of time (a few weeks, normally, in the case of Covid-19) beyond which they would either have recovered or (in a small percentage of the cases) have died.  And they will normally only be able to infect others for about a week (starting one week after they themselves had become infected), although possibly for up to two weeks.

Any such infectious disease will therefore spread when, on average, each individual with the disease spreads the disease on to more than one other person.  And given the arithmetic of compounding, that number can grow to be very large very quickly.  If each individual on average infects 2 other individuals in each cycle, then after just 10 cycles the one individual with the disease would have led to the infections of over 1,000.  It doubles in each cycle.  If each cycle is, on average, a week and a half (one week for the virus to multiply in the individual, and then one week during which the person can be infectious, so on average will infect others at the mid-point of the second week), those 10 cycles will require only 15 weeks.

But if the wearing of masks (along with other socially responsible behaviors, such as social distancing) reduces the average number of people that an individual with the disease will infect to less than one, then the disease will die out.  And again, with the arithmetic of compounding, this can be quite quick.  Suppose one starts out with 100 individuals with the disease in some locality.  If, on average, each infected individual spreads the disease to another person only half the time, then 100 individuals will spread it to 50 during the first cycle, to 25 in the next, and so on.  One can calculate that if this continues at such a rate, then less than one new person would become infected after just 7 cycles (or 10 1/2 weeks if each cycle is on average a week and a half).  And the disease would have been stopped.

Masks work because they can bring down that reproduction rate (what epidemiologists call Rt) from something above 1.0 to something below.  The example here is that masks (along with other socially responsible behaviors) reduced the Rt to 0.5.  This would be a 75% reduction if the Rt is 2.0 when nothing is done to stop the spread of the disease.  That is not perfect, but it does not need to be perfect to stop the spread.  And 70 to 80% is a reasonable estimate of how effective masks are.  If the US were to reduce the Rt to 0.5 going forward, then the daily number of new cases (currently, as I write this, about 80,000 each day) would fall to less than 100 in just 10 cycles (15 weeks).

This is of course just arithmetic, but the power of compounding is extremely important to recognize when addressing how to bring an infectious disease under control.  Masks do not need to be 100% effective – they merely need to bring the Rt down to less than 1.0.  And in this they are similar to vaccines.  No vaccine is 100% effective.  For the virus that causes Covid-19, the FDA has issued guidelines stating that a vaccine that is safe and has a minimum effectiveness of just 50% would be approved.  It is hoped that the vaccines currently being tested will have a greater degree of effectiveness, but the expectation is that they might at most be perhaps 80% effective, and probably 70% or less is more likely.

That does not mean such vaccines would not be valuable.  As just noted, a vaccine that brought the Rt down to 0.5 would lead to the disease dying out in a relatively short time.  But as Dr. Robert Redfield, the head of the CDC, noted in testimony before Congress on September 16, the effectiveness of masks is similar if not greater than what is expected for a vaccine.  In that testimony he stated, as he has in other fora in recent months (see here and here, for example), that if Americans wore these simple masks, that in “six, eight, 10, 12 weeks we’d bring this pandemic under control.”  And further in that testimony: “I might even go so far as to say this face mask is more guaranteed to protect me against COVID than when I take a COVID vaccine, because the immunogenicity might be 70%, and if I don’t get an immune response the vaccine’s not going to protect me. This face mask will.”

But there is an important proviso.  These effectiveness percentages, whether for masks or for vaccines, reflect how likely they will protect an individual who is exposed to the virus.  But their effectiveness in reducing Rt will then depend on what share of the population wears a mask or is vaccinated.  Usage of masks or vaccinations will never cover 100% of the population, and the reduction in Rt will then be less.  If not enough people follow responsible social behaviors – most importantly wearing masks – or choose not to be vaccinated once a vaccine becomes available, the virus will continue to spread.

Political leadership is therefore critical, but Trump has been unwilling.  Despite the uniform advice of medical professionals in the field, Trump has been unwilling to call on all Americans, and in particular all of his supporters, to wear masks.  He rarely wears masks himself, makes a big show of pulling it off when he has had to wear one (such as when he returned to the White House from Walter Reed Hospital, where he had been treated for Covid-19), and continues to organize large political rallies where few wear masks (but with participants required to sign legal waivers saying that should they become infected as a result, they cannot sue the Trump campaign).  And Trump continues to mock Joe Biden and others who are conscientious in wearing masks when in public.

Why?  Wearing a mask makes it obvious that an infectious disease is circulating.  It makes it obvious that Trump and his administration have failed to bring this terrible disease under control.  Trump continues to assert instead, as he has from the start as well as more recently (during, for example, the second, October 22, debate with Joe Biden), that all is under control and that while there have been “spikes” they are all either “gone” or “will soon be gone”.  From the start in January, Trump has repeatedly asserted that it was “totally under control”, that “It’s going to be just fine”, that it was just a hoax (indeed, a “new hoax” of the Democrats), and that it would soon (Trump asserted in February) just disappear (“like a miracle”).  And Trump’s repeated assertion that “it’s going away” is well-documented in this Washington Post video compilation.

But cases are in fact rising as I write this, and rising rapidly.  Confirmed cases hit over 83,000 on October 23 and then over 83,000 again on October 24 – they had never before exceeded 77,300 in a single day in the US.  Hospitalizations are rising as well, and the surge in hospitalizations is starting again to overwhelm hospitals in parts of the country.  It is absurd to say, as Trump repeatedly insists, that cases are rising only because more testing is being done.  (As one wag put it:  “I stopped gaining weight as soon as I stopped weighing myself.”)

The number of dead in the US from this disease now exceeds (as I write this) over 228,000.  That exceeds the number of soldiers who died in battle in the US Civil War (Union plus Confederate together) of 214,938.  It is 70% greater than the 134,575 Americans who died in battle in World War I plus the Korean War plus the Vietnam War, combined.  This has been the worst public health crisis in the US in more than a century.  Yet Trump claims he has been a great success.

The widespread wearing of masks would be an obvious signal of Trump’s failure.  It is understandable (but not defensible) that he would want to hide such overt signs of his failure before the upcoming election.  But to put short-term politics above public health concerns is deplorable.

The US Has Hit Record High Fiscal and Trade Deficits

A.  Introduction

The final figures to be issued before the election for the federal government fiscal accounts and for the US trade accounts have now been published.  The US Treasury published earlier today the Final Monthly Treasury Statement for the FY2020 fiscal year (fiscal years end September 30), and earlier this month the BEA and the Census Bureau issued their joint monthly report on US International Trade in Goods and Services, with trade data through August.  The chart above shows the resulting fiscal deficit figures (as a share of GDP) for all fiscal years since FY1948, while a chart for the trade deficit will be presented and discussed below.  The figures here update material that had been presented in a post from last month on Trump’s economic record.

The accounts show that the federal fiscal deficit as a share of GDP has reached a record level (other than during World War II), while the trade deficit in goods (in dollar amount, although not as a share of GDP) has also never been so high.  Trump campaigned in 2016 arguing that these deficits were too high, that he would bring them down sharply, and indeed would pay off the entire federal government debt (then at over $19 trillion) within eight years.  Paying off the debt in full in such a time frame was always nonsense.  But with the right policies he could have at least had them go in the directions he advocated.  However, they both have moved in the exact opposite direction.  Furthermore, this was not only a consequence of the economic collapse this year.  They were both already increasing before this year.  The economic collapse this year has simply accelerated those trends – especially so in the case of the fiscal deficit.

B.  The Record High Fiscal Deficit

The federal deficit hit 15.2% of GDP in FY2020 (using the recently issued September 2020 estimate by the CBO of what GDP will be in FY2020).  The highest it had been before (other than during World War II) was 9.8% of GDP in FY2009, in the final year of Bush / first year of Obama, due to the economic collapse in that final year of Bush.  In dollar terms, the deficit this fiscal year hit $3.1 trillion, which was not far below the entire amount collected in tax and other revenues of $3.4 trillion.

This deficit is incredibly high, which does not mean, however, that an increase this year was not warranted.  The US economy collapsed due to Covid-19, but with a downturn sharper than it otherwise would have been had the administration not mismanaged the disease so badly (i.e. had it not neglected testing and follow-up measures, plus had it encouraged the use of masks and social distancing rather than treat such measures as a political statement).  By neglecting such positive actions to limit the spread of Covid-19, the only alternative was to limit economic activity, whether by government policy or by personal decision (i.e. to avoid being exposed to this infectious disease by those unwilling to wear masks).

The sharp increase in government spending this year was therefore necessary.  The real mistake was the neglect by this administration of measures to reduce the fiscal deficit during the period when the economy was at full employment, as it has been since 2015.  Instead of the 2017 tax cut, prudent fiscal policy to manage the debt and to prepare the economy for the risk of a downturn at some point would have been to call for a tax increase under such conditions.  The tax cut, coupled also with an acceleration in government spending, led fiscal deficits to grow under Trump well before Covid-19 appeared.  Indeed, they grew to record high levels for periods of full employment (they have been higher during downturns).  As the old saying goes:  “The time to fix the roof is when the sun is shining.”  Trump received from Obama an economy where jobs and GDP had been growing steadily and unemployment was just 4.7%.  But instead of taking this opportunity to reduce the fiscal deficit and prepare for a possible downturn, the fiscal deficit was increased.

The result is that federal government debt (held by the public) has jumped to 102% of GDP (using the CBO estimate of GDP in FY2020):

The last time the public debt to GDP ratio had been so high was at the end of World War II.  But the public debt ratio will soon certainly surpass that due to momentum, as fiscal deficits cannot be cut to zero overnight.  The economy is weak, and fiscal deficits will be required for some time to restore the economy to health.

C.  The US Trade Deficit is Also Hitting Record Highs in Dollar Terms

In the 2016 campaign, Trump lambasted what he considered to be an excessively high US trade deficit (specifically the deficit in goods, as the US has a surplus in the trade in services), which he asserted was destroying the economy.  He asserted these were due to the various trade agreements reached over the years (by several different administrations).  He would counter this by raising tariffs, on specific goods or against specific countries, and through this force countries to renegotiate the trade deals to the advantage of the US.  Deficits would then, he asserted, rapidly fall.  They have not.  Rather, they have grown:

Trump has, indeed, launched a series of trade wars, unilaterally imposing high tariffs and threatening to make them even higher (proudly proclaiming himself “Tariff Man”).  And his administration has reached a series of trade agreements, including most prominently with South Korea, Canada, Mexico, Japan, the EU, and China.  But the trade deficit in goods reached $83.9 billion in August.  It has never been so high. The deficit in goods and services together is not quite yet at a record high level, although it too has grown during the Trump period in office.  In August that broader deficit hit $67.1 billion, a good deal higher than it ever was under Obama but still a bit less than the all-time record of a $68.3 billion deficit reached in 2006 during the Bush administration, at the height of the housing bubble.

The fundamental reason the deficits have grown despite the trade wars Trump has launched is that the size of the overall trade deficit is determined not by whatever tariffs are imposed on specific goods or on specific countries, nor even by what trade agreements have been reached, but rather by underlying macro factors.  As discussed in an earlier post on this blog, the balance in foreign trade will be equal to the difference between aggregate domestic savings and aggregate domestic investment.  Tariffs and trade agreements will not have a significant direct impact on those macro aggregates.  Rather, tariffs applied to certain goods or to certain countries, or trade agreements reached, may lead producers and consumers to switch from whom they might import items or to whom they might export, but not the overall balance.  Trade with China, for example, might be reduced by such trade wars (and indeed it was), but this then just led to shifts in imports away from China and towards such countries as Viet Nam, Cambodia, Bangladesh, and Mexico.  Unless aggregate savings in the US increases or aggregate investment falls, the overall trade deficit will remain where it was.

Tariffs and trade agreements can thus lead to switches in what is traded and with whom.  Tariffs are a tax, and are ultimately paid largely by American households.  Purchasers may choose either to pay the higher price due to the tariff, or switch to a less desirable similar product from someone else (which had been either more expensive, pre-tariff, or less desirable due to quality or some similar issue), but unless the overall savings / investment balance in the economy is changed, the overall trade deficit will remain as it was.  The only difference resulting from the trade wars is that American households will then need to pay either a higher price or buy a less desirable product.

It is understandable that Trump might not understand this.  He is not an economist, and his views on trade are fundamentally mercantilist, which economists had already moved beyond over 250 years ago.  But Trump’s economic advisors should have explained this to him.  They have either been unwilling, or unable, to do so.

Are the growing trade deficits nevertheless a concern, as Trump asserted in 2016 (when the deficits were lower)?  Actually, in themselves probably not.  In the second quarter of 2020 (the most recent period where we have actual GDP figures), the trade deficit in goods reached 4.5% of GDP.  While somewhat high (generally a level of 3 to 4% of GDP would be considered sustainable), the trade balance hit a substantially higher 6.4% of GDP in the last quarter of 2005 during the Bush administration.  The housing bubble was then in full swing, households were borrowing against their rising home prices with refinancings or home equity loans and spending the proceeds, and aggregate household savings was low.  With savings low and domestic investment moderate (not as high as a share of GDP as it had been in 2000, in the last year of Clinton, but close), the trade deficit was high.  And when that housing bubble burst, the economy plunged into the then largest economic downturn since the Great Depression (largest until this year).

Thus while the trade deficit is at a record level in dollar terms (the measure Trump refers to), it is at a still high but more moderate level as a share of GDP.  It is certainly not the priority right now.  Recovering from the record economic slump (where GDP collapsed at an annualized rate of 31% in the second quarter of 2020) is of far greater concern.  And while expectations are that GDP bounced back substantially (but only partially) in the third quarter (the initial estimate of GDP for the third quarter will be issued by the BEA on October 29, just before the election), the structural damage done to the economy from the mismanagement of the Covid-19 crisis will take substantial time to heal.  Numerous firms have gone bankrupt.  They and others who may survive but who have been under severe stress will not be paying back their creditors (banks and others), so financial sector balance sheets have also been severely weakened.  It will take some time before the economic structure will be able to return to normal, even if a full cure for Covid-19 magically appeared tomorrow.

D.  Conclusion

Trump promised he would set records.  He has.  But the records set are the opposite of what he promised.