Pandemics and epidemics on Twitter

Predictions that a global pandemic will wipe out a large percentage of the population is regarded as a genuine threat. And it was recently reported that an outbreak of a drug-resistant infection could kill 80,000 people in the UK.


A man protests for the mandatory quarantine of everyone that has returned from Ebola affected countries in front of the White House in Washington, D.C., on Oct.24, 2014. Photographer: Mark Wilson/Getty Images.

In terms of a threat that has just passed, if we think back to September of last year, at the peak of the Ebola outbreak. Conversations about the virus on Twitter started to increase. Due to the accumulation of news reports and sensationalised headlines, similar to the one above.

The actual threat, however, as opposed to the perceived public threat, remained low. Fear and hysteria may not allow people to think, or act logically during an outbreak. So, it is crucial to have an awareness of how people are communicating about an infectious disease. Using real-time data from Twitter it is possible for researchers to gauge public opinion on infectious disease outbreaks.

Why use Twitter?

Twitter feasibly offers researchers millions of views on an outbreak that are available in real-time.  This allows the examination of how a subset of the population may react to an infectious disease outbreak. There is ongoing research on why people may have negative views towards vaccines, for example, as this could affect the spread of a disease.pg17-twitter-getty

Picture:  Getty Images

Gauging public opinion at the precise time of an outbreak may not be feasible using traditional methods; as designing a survey or questionnaire is an expensive, and time-consuming process. Though, most research suggests that data from Twitter is best used in combination with traditional methods rather than as a substitute. Especially for research that predicts the occurrence of an infectious disease.

Challenges of using Twitter

On the other hand, not all adult Internet users are on Twitter, but adult internet users on Twitter is increasing. According to the Pew Research Centre, 23% of adult internet users also use Twitter (18% in 2013); 19% of the entire adult population. Twitter, however, is most popular with those who are under 50, and college educated.

When these figures are compared to Facebook, Twitter does not stack-up well, as 71% of adult internet users are on Facebook; 58% of the entire population. And 65% of Facebook users are 65 and over. Those who tweet about outbreaks may be overrepresented in relation to the national offline population, but these people may be under-represented in survey data.

It is also difficult to obtain Twitter data as Twitter only provides a sample of data to researchers. And obtaining full Twitter data can be quite costly for small to medium sized research groups. There are also issues that arise surrounding spam on the platform, and developing methods of filtering out useful content can be quite challenging.

Current research on Infectious diseases using Twitter

Current research on infectious disease outbreaks suggests that Twitter offers a method of understanding what a subset of the population communicate about in real-time. The misconceptions that people may hold, and whether these will be harmful in a public health epidemic or pandemic.


A man dressed in protective hazmat closing leaves after treating a nurse in Texas who is diagnosed with the Ebola virus. Photographer: Mike Stone/Getty Images

Specifically on the Ebola outbreak, early research indicates that there may have been medical misinformation present on the platform regarding vaccines, the role of health officials, and the cure and transmission of Ebola.

My own research involves using Twitter data related to the Ebola outbreak to better understand the content on the platform, how people communicate about Ebola, and to examine the types of information that is present on the platform.

In the present day, research teams are developing better methods in analysing social media data. So this type of research will start to become more sophisticated in the future.

Language frequency of Ebola tweets

Ebola is a unique word particularly for an infectious disease; in comparison to Bird Flu or Swine Flu, for example, where developing search queries may be difficult. In the case of Ebola, using the keyword on its own, for me, has been sufficient to gather an enormous amount of tweets.  And for languages supported on Twitter, ‘Ebola’ is used across 15 languages and 7 languages have their own translation. As shown in the table below:

Language Key word
English, German, Spanish, Portuguese, French, Italian, Dutch, Turkish, Hungarian, Swedish, Polish, Danish, Norwegian, Finnish, Hindi Use ‘Ebola’
Russian, Japanese, Arabic Korean, Thai, Urdu, Farsi Different keyword

I found that my sample of tweets contain languages which have different translation of Ebola as Twitter users may opt to use ‘Ebola’ rather than their own translation. For example, Russian tweeters may use ‘Ebola’ rather than ‘Эбола’.

In order to examine the percentage of English tweets relative to those in other languages; I gathered over a million tweets using Mozdeh which uses Twitter’s Search API. The tweets were gathered over an 11 day period starting 27th of November and ending on the 7th of December 2014.

I used the language metadata to work out the frequencies of these using SPSS, and I have created a table to show the different languages:

Language Breakdown
Frequency (%)
English 632112  (62.3)
Spanish 220566 (21.8)
Portuguese 59774 (5.9)
French 42242 (4.2)
Italian 20645 (2.0)
Dutch 12698 (1.3)
Turkish 5099 (0.5)
German 4899 (0.5)
Russian* 2267 (0.2)
Hungarian 1854 (0.2)
Swedish 1779 (0.2)
Japanese* 1649 (0.2)
Polish 1362 (0.1)
Arabic* 1303 (0.1)
Danish 586 (0.1)
Norwegian 465 (0.0)
Finnish 405 (0.0)
Korean 366 (0.0)
Hindi 187 (0.0)
Thai* 170 (0.0)
Urdu* 116 (0.0)
Farsi* 36 (0.0)
Total 1010580
Missing** 37995
Total 1048575

*These languages have their own translation of ‘Ebola’, but users have still chosen to use ‘Ebola’.
**Not all tweets have language identifiers 

The keyword Ebola was picked up across 22 out of 29 languages that Twitter supports. It is interesting to note that 62.3% of Ebola tweets are in English, and Spanish tweets are the second most frequent (21.8%), the third most frequent tweets are in Portuguese (5.9%). For my PhD research I am focusing on English language tweets and this type of analysis tells me that there are a sufficient number of English language tweets related to the Ebola epidemic.

A limitation of this, however, is that I was only able to draw up frequencies of languages that are ‘supported’ by Twitter, for which there is metadata. And not for languages which do not have language identifiers, such as Sub-Saharan African languages.

In the next post I will look at the number of tweets on Ebola that have geolocation data and cross-tabulate these with language identifiers. These results form a part of a larger project which has ethics approval.

Using Twitter to gain an insight into public views and opinions for the Ebola epidemic

The World Health Organisation writes that Ebola, a haemorrhagic fever, is a very severe and fatal illness with an average fatality rate of 50%. The first outbreak of Ebola occurred in 1976. The first case of Ebola, outside of West Africa, was reported in the U.S on September 19th 2014. The current Ebola outbreak has taken more lives and infected more people than all the other outbreaks combined. And Twitter provides a platform for people to express their views and opinions on Ebola.

Chew and Eysenbach, for example, used Twitter to monitor the mentions of Swine Flu during the 2009 pandemic. They found that Twitter provided health authorities with the potential to become aware of the concerns, which were raised by the public. Similarly, Szomszor, Kostkova, and Louis examined Swine Flu on Twitter and found that Twitter offers the ability to sample large populations for health sentiment (public views and opinions). Signorini, Segre, and Polgreen also found that by using Twitter it was possible to understand user’s interests and concerns during the Swine Flu outbreak.

In 2010, Chew and Eysenbach wrote that Swine Flu was the first global pandemic which had occurred in the age of Web 2.0, and argued that this was a unique opportunity to investigate the role of technology for public health. Fast forward to the current outbreak of Ebola, this is the first time a global outbreak of Ebola has occurred in the age of Web 2.0.
And as the number of Twitter users has increased since 2010, there is the possibility to examine the recent Ebola outbreak on a larger scale.

In relation to the Ebola outbreak on Twitter. A study by Oluwafemi, Elia and Rolf published last year examined misinformation for Ebola on Twitter. This study found that the most common types of misinformation on Ebola were, that ingesting a plant ‘Ewedu’, blood transfusions, or drinking salt water could cure Ebola. Another study by Jin et al, which was published last year, found that there were conspiracy theories, innuendos, and rumours on Twitter related to Ebola. Jin et al looked at the time period between late September to late October (2014). Among some of the rumours reported, was that the Ebola vaccine only worked on white people, that Ebola patients had risen from the dead, and that terrorists would contract Ebola and spread it around the world.

Therefore, Twitter has the potential to provide insight into public views and opinions related to the Ebola outbreak, which would allow health authorities to become aware of the public concerns. Furthermore, by examining the rumours related to Ebola health authorities will be able to dispel false information via new or existing health campaigns.

In the next post I will examine the language dynamics of tweets related to Ebola.


I would like to thank Jennifer Salter, from the health informatics research group, for reading and providing extremely valuable feedback on an earlier version of this blog post.


Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. PLOS ONE, 5(11).

Fang Jin; Wei Wang; Liang Zhao; Dougherty, E.; Yang Cao; Chang-Tien Lu; Ramakrishnan, N., “Misinformation Propagation in the Age of Twitter,” Computer , vol.47, no.12, pp.90,94, Dec. 2014
doi: 10.1109/MC.2014.361

Signorini A, Segre AM, Polgreen PM. (2011) The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS ONE 6(5): e19467. doi:10.1371/journal.pone.0019467

Szomszor, M., Kostkova, P., & St Louis, C. (2011). Twitter informatics: Tracking and understanding public reaction during the 2009 Swine Flu pandemic. In Proceedings – 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 (Vol. 1, pp. 320–323). doi:10.1109/WI-IAT.2011.311

WHO. (2015). WHO | Ebola virus disease. [ONLINE] Available at: [Last accessed 20/01/2015].

Oyeyemi Sunday Oluwafemi, Gabarron Elia, Wynn Rolf. Ebola, Twitter, and misinformation: a dangerous combination? BMJ 2014; 349 :g6178

An outline of upcoming blog posts

Starting this week, I’m going to be posting blog posts about my PhD research. I’m currently looking at Twitter to better understand public views and opinions related to the Ebola outbreak. I have gathered tweets on Ebola using both open source, and industry specific software. And monitored the international news coverage of Ebola very carefully. I have a series of blog posts lined up which will cover some of the following topics:

  • Using Twitter to gather public views and opinions on Ebola
  • The different languages people use to Tweet about Ebola
  • The number of tweets on Ebola that have geolocation data
  • The number of Ebola tweets that have geolocation and language identifiers
  • A comparison of Ebola tweets with geolocation data across different APIs
  • Popular hashtags, TAG and word clouds on Ebola for Firehose data
  • TAG and word cloud comparisons across the REST, Streaming, and Firehose APIs
  • Network analysis using NodeXL