Twitter data capture tools from a usability perspective

In a blog post comment I was asked what tools are good from a usability and interface perspective. And I thought this would make for a good blog post. The tools covered in this blog were recommended to me by my PhD supervisor. Many of these tools have existing guides, videos or instructional tutorials and rather than provide my own I have provided the links to these.

Users of these tools are reminded that the data obtained via the tools should be used in a fair and responsible manner. And this means adhering to Twitter’s Rules of the Road as well as applicable ethical codes of practice and data protection laws.

TAGS (Twitter Arching Google Spreadsheet)

System: TAGS is a Web based tool so it will work on most operating systems.

Download TAGS:

TAGS Support Forums:


System: Mozdeh only works on Windows and it is advisable to use a Desktop computer (there are 32 and 64 bit versions).

Download Mozdeh:

Mozdeh User Guide:

Mozdeh Theoretical overview:

Twitter query set generation with Mozdeh:


System: Chorus only runs on Windows. It is also advisable to use Chorus with a desktop computer.

Request to download Chorus:

Chorus Tweetcatcher Desktop manual:

YouTube tutorial:

I made another list a while back ‘A list of tools to capture Twitter data’ at: 

Also be sure to check out via Dr Deen Freelon’s curated list at:

You can catch me on Twitter @was3210 

Using Twitter to gain an insight into public views and opinions for the Ebola epidemic

The World Health Organisation writes that Ebola, a haemorrhagic fever, is a very severe and fatal illness with an average fatality rate of 50%. The first outbreak of Ebola occurred in 1976. The first case of Ebola, outside of West Africa, was reported in the U.S on September 19th 2014. The current Ebola outbreak has taken more lives and infected more people than all the other outbreaks combined. And Twitter provides a platform for people to express their views and opinions on Ebola.

Chew and Eysenbach, for example, used Twitter to monitor the mentions of Swine Flu during the 2009 pandemic. They found that Twitter provided health authorities with the potential to become aware of the concerns, which were raised by the public. Similarly, Szomszor, Kostkova, and Louis examined Swine Flu on Twitter and found that Twitter offers the ability to sample large populations for health sentiment (public views and opinions). Signorini, Segre, and Polgreen also found that by using Twitter it was possible to understand user’s interests and concerns during the Swine Flu outbreak.

In 2010, Chew and Eysenbach wrote that Swine Flu was the first global pandemic which had occurred in the age of Web 2.0, and argued that this was a unique opportunity to investigate the role of technology for public health. Fast forward to the current outbreak of Ebola, this is the first time a global outbreak of Ebola has occurred in the age of Web 2.0.
And as the number of Twitter users has increased since 2010, there is the possibility to examine the recent Ebola outbreak on a larger scale.

In relation to the Ebola outbreak on Twitter. A study by Oluwafemi, Elia and Rolf published last year examined misinformation for Ebola on Twitter. This study found that the most common types of misinformation on Ebola were, that ingesting a plant ‘Ewedu’, blood transfusions, or drinking salt water could cure Ebola. Another study by Jin et al, which was published last year, found that there were conspiracy theories, innuendos, and rumours on Twitter related to Ebola. Jin et al looked at the time period between late September to late October (2014). Among some of the rumours reported, was that the Ebola vaccine only worked on white people, that Ebola patients had risen from the dead, and that terrorists would contract Ebola and spread it around the world.

Therefore, Twitter has the potential to provide insight into public views and opinions related to the Ebola outbreak, which would allow health authorities to become aware of the public concerns. Furthermore, by examining the rumours related to Ebola health authorities will be able to dispel false information via new or existing health campaigns.

In the next post I will examine the language dynamics of tweets related to Ebola.


I would like to thank Jennifer Salter, from the health informatics research group, for reading and providing extremely valuable feedback on an earlier version of this blog post.


Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. PLOS ONE, 5(11).

Fang Jin; Wei Wang; Liang Zhao; Dougherty, E.; Yang Cao; Chang-Tien Lu; Ramakrishnan, N., “Misinformation Propagation in the Age of Twitter,” Computer , vol.47, no.12, pp.90,94, Dec. 2014
doi: 10.1109/MC.2014.361

Signorini A, Segre AM, Polgreen PM. (2011) The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS ONE 6(5): e19467. doi:10.1371/journal.pone.0019467

Szomszor, M., Kostkova, P., & St Louis, C. (2011). Twitter informatics: Tracking and understanding public reaction during the 2009 Swine Flu pandemic. In Proceedings – 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 (Vol. 1, pp. 320–323). doi:10.1109/WI-IAT.2011.311

WHO. (2015). WHO | Ebola virus disease. [ONLINE] Available at: [Last accessed 20/01/2015].

Oyeyemi Sunday Oluwafemi, Gabarron Elia, Wynn Rolf. Ebola, Twitter, and misinformation: a dangerous combination? BMJ 2014; 349 :g6178

An outline of upcoming blog posts

Starting this week, I’m going to be posting blog posts about my PhD research. I’m currently looking at Twitter to better understand public views and opinions related to the Ebola outbreak. I have gathered tweets on Ebola using both open source, and industry specific software. And monitored the international news coverage of Ebola very carefully. I have a series of blog posts lined up which will cover some of the following topics:

  • Using Twitter to gather public views and opinions on Ebola
  • The different languages people use to Tweet about Ebola
  • The number of tweets on Ebola that have geolocation data
  • The number of Ebola tweets that have geolocation and language identifiers
  • A comparison of Ebola tweets with geolocation data across different APIs
  • Popular hashtags, TAG and word clouds on Ebola for Firehose data
  • TAG and word cloud comparisons across the REST, Streaming, and Firehose APIs
  • Network analysis using NodeXL

A list of tools to capture Twitter data

A list of tools that I have used to capture data from Twitter and which worked:






Twython at:

KNIME: with the Palladian Extension (obtained via the app). Instructions on set up here: .  Using the Twitter nodes from the extension menu provided by KNIME is much better. The instructions on setting this up are here : I could not figure out a way to extract the tweets.

NodeXL at:

Visibrain (Commercial):

More tools:

Nvivo/Ncapture at:

TweetMapper at:

Twitonomy at:

Webometrics at:

Follow the Hashtag at:

iScience Maps at:

More tools (require programming knowledge) from Deen Freelon’s curated Google Sheets template at: it is a great list and I make sure to add to it:


yourTwapperKeeper at:

140dev at:

Hosebird at:

Pattern at:

poll.emic at:

Social Feed Manager at:

SocialMediaMineR at:

streamR at:

tStreamingArchiver at:

twarc at:

tweepy at:

twitteR at:

Twitter-Tap at:

Twitter Stream Downloader at:

TWurl at:

Be sure to check out my other list: ‘A list of tools to capture Twitter data’ at:

Also be sure to check out via Dr Deen Freelon’s curated list at: You can catch me on Twitter @was3210 

Almost 6 months of PhD!

My six month progress report is due in soon so I decided to do a blog post about some of the topics and issues I have encountered, and with which I am currently battling with.  I am looking at pandemics and epidemics on Web 2.0. More recently, however, I have been investigating the Ebola epidemic, and I have been collecting Ebola related tweets.

Big data

Big data is a current buzzword within academia and is considered by some to be the new oil. However, keeping with the oil analogy, is it real oil or snake oil? This issue was chronicled by Simon Moss in a Wired article Big Data: New Oil or Snake Oil? Simon discusses the issue of normalising big data in an organisational sense. My issue is that of information quality, that is, the data is big, but, at times, it is of a poor quality. When the data is filtered it is not as big as it once was, and so it becomes little data. However, this small or little data is much more valuable in comparison to the larger set of data.


Ethical issues are ever present in social media research. The argument in favour of the utilisation of Web 2.0 for research is centred on the argument on whether the data is in the public domain. This raises questions on whether there is informed consent. Moreover, do Twitter users know that I am gathering this data? If I ask for consent for a tweet on Ebola that I captured in August would I even get a reply? There is a sense, as a Twitter user, that when you send a Tweet out that after a while it goes away. Thus, it is imperative that Twitter users are involved in the decision progress when discussing ethical issues. This was discussed at a conference I attended in November, Picturing the Social: Analysing Social Media Images.


I recently viewed a talk by Farida Vis which formed a part of the digital culture conference, improving reality. A very well-articulated example of the human influence on an algorithm was provided by Farida. This was of an advert on Facebook which promoted an assisted reproduction program, with a picture of a baby. Farida argues that this reflects how those who programmed the algorithm understand gender normative issues. That is, those who wrote the code held a schema whereby they believed a women of a certain age should have children. More recently, on Twitter I witnessed an advert that was advertising a laptop with the caption ‘Costs less than what you spend on Pizza last year’ which resulted in livid responses e.g. ‘Twitter what are you trying to say?’ This advert could have been targeted at all users, so this may not be the best example of a targeted algorithm. A further example is that of adverts for educational courses from Facebook, before I started university. This leads to a question of how much influence social media has on young adults.  There is scope here, also, to examine how websites such as Amazon create suggestions. How does their algorithm work? And where does the human schemas fit in to this.


When talking about methods there is a tendency to select either a quantitative or qualitative research philosophy. However, in regards to social media research using a mixed method approach will yield richer results. That is, a method of analysis such as network analysis should be complemented with content analysis. If we limit ourselves to a particular research philosophy we will learn less from the data. So, I hope to employ a range of methods in analysing my own data. A related issue around methods, is that of the cost of big data. Big data is certainly out of reach for most academics and this is further exacerbated by stringent terms and conditions which restrict data sharing. The issue of whether the data is available for free, or whether there is a tool to obtain the data is also shaping the platforms I look at.


In my dataset of tweets images occur with great frequency and are often represented as a block of web links when scrolling down a spreadsheet. When I start to filter the dataset should I remove these links? An observation of big data is that it is associated with words and not images. However, in regards to images on Twitter; I would argue that they form a larger network of big data. According to one estimate, there are 250 million images shared on Twitter daily. However, these are overlooked in the majority of Twitter research. That is, during the 2009/2010 epidemic of H1N1, and the various subsequent outbreaks, images must have been shared on Twitter. The images would have formed an integral part of how a person may subsequently think about outbreaks.  However, there was no evidence based research examining these images. Comparing images from different time points allows us to see whether narratives told via images remain the same or whether these change.

In text references:

The Wired News Article I mentioned can be found here:

The talk by Farida Vis on algorithmic culture I mentioned can be found here:

[Edited on 26/01/15]