An improved methodology for estimating the prevalence of SARS-CoV-2

Virag Patel, Catherine McCarthy, Rachel A. Taylor, Ruth Moir, Louise A. Kelly, Emma L. Snary

Research output: Working paper

10 Downloads (Pure)


Since the identification of Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in China in December 2019, there have been more than 17 million cases of the disease in 216 countries worldwide. Comparisons of prevalence estimates between different communities can inform policy decisions regarding safe travel between countries, help to assess when to implement (or remove) disease control measures and identify the risk of over-burdening healthcare providers. Estimating the true prevalence can, however, be challenging because officially reported figures are likely to be significant underestimates of the true burden of COVID-19 within a community. Previous methods for estimating the prevalence fail to incorporate differences between populations (such as younger populations having higher rates of asymptomatic cases) and so comparisons between, for example, countries, can be misleading. Here, we present an improved methodology for estimating COVID-19 prevalence. We take the reported number of cases and deaths (together with population size) as raw prevalence for the population. We then apply an age-adjustment to this which allows the age-distribution of that population to influence the case-fatality rate and the proportion of asymptomatic cases. Finally, we calculate the likely underreporting factor for the population and use this to adjust our prevalence estimate further. We use our method to estimate the prevalence for 166 countries (or the states of the United States of America, hereafter referred to as US state) where sufficient data were available. Our estimates show that as of the 30th July 2020, the top three countries with the highest estimated prevalence are Brazil (1.26%, 95% CI: 0.96 - 1.37), Kyrgyzstan (1.10%, 95% CI: 0.82 - 1.19) and Suriname (0.58%, 95% CI: 0.44 - 0.63). Brazil is predicted to have the largest proportion of all the current global cases (30.41%, 95%CI: 27.52 - 30.84), followed by the USA (14.52%, 95%CI: 14.26 - 16.34) and India (11.23%, 95%CI: 11.11 - 11.24). Amongst the US states, the highest prevalence is predicted to be in Louisiana (1.07%, 95% CI: 1.02 - 1.12), Florida (0.90%, 95% CI: 0.86 - 0.94) and Mississippi (0.77%, 95% CI: 0.74 - 0.81) whereas amongst European countries, the highest prevalence is predicted to be in Montenegro (0.47%, 95% CI: 0.42 - 0.50), Kosovo (0.35%, 95% CI: 0.29 - 0.37) and Moldova (0.28%, 95% CI: 0.23 - 0.30). Our results suggest that Kyrgyzstan (0.04 tests per predicted case), Brazil (0.04 tests per predicted case) and Suriname (0.29 tests per predicted case) have the highest underreporting out of the countries in the top 25 prevalence. In comparison, Israel (34.19 tests per predicted case), Bahrain (19.82 per predicted case) and Palestine (9.81 tests per predicted case) have the least underreporting. The results of this study may be used to understand the risk between different geographical areas and highlight regions where the prevalence of COVID-19 is increasing most rapidly. The method described is quick and easy to implement. Prevalence estimates should be updated on a regular basis to allow for rapid fluctuations in disease patterns.
Original languageEnglish
Number of pages14
Publication statusPublished - 6 Aug 2020


  • SARS-CoV-2
  • prevalence estimates
  • Coronavirus
  • COVID-19
  • severe acute respiratory syndrome coronavirus 2
  • likely underreporting factor


Dive into the research topics of 'An improved methodology for estimating the prevalence of SARS-CoV-2'. Together they form a unique fingerprint.

Cite this