Dr. Wasim Ahmed

Echosec: Location-Based Social Media Search – Potential For Academic Research And Industry

Echosec, simply put, allows you to navigate to almost any location in the world and examine the social media activity around that vicinity. Currently Echosec Pro allows users to access at least the following social data feeds:

Instagram
Twitter
Foursquare
Panoramio
AIS Shipping
Sina Weibo
Flickr
YouTube
VK

The Echosec platform provides enormous research potential as it is possible to select a specific geographical area and examine the social media activity around it.

cayzptxwwaaxw79 — Echosec Dashboard Layout

Users can plot a rectangle, circle, or plot a custom shape almost anywhere in the world to display the social media activity around that area. Users can also use advance date filtering features to ensure only relevant posts are displayed.

Echnosec also has great potential in regards to business intelligence as it is possible to monitor chatter around a specific area. For instance, finding out that social media users in a particular area are complaining about the lack of particular store or product e.g., a coffee shop.

One of the biggest advantages of Echosec are that it is not based on a specific social media platform; it allows users to aggregate data from several popular social media networks.

In addition to location-based searching, it is also possible to search via keywords and examine where posts derive from. For instance, to find out whether users in certain geographical regions are mentioning a trending hashtag.

Echosec works by making use of location-based metadata to search for social media and other open source information. It relies mostly on a range of API requests directly to the social media networks (Twitter, Instagram, and Facebook etc), but also to third party information repositories.

Echosec is used by those within the Public Safety and Intelligence sector, the Corporate Security & Investigations sector, and within the Media and Journalism sector.

Used ethically and within the right hands, Echosec has great potential for public good. I also see it to have excellent potential for academic research projects.

Compared to some of the other social media analytics software out there, Echosec Pro is extremely affordable at only $89 per month annually. It’s definitely worth checking out. You can access the free version of Echosec here.

Disclaimer: No data was retrieved and/or analysed in the writing of this blog post.

Amplified messages: How hashtag activism and Twitter diplomacy converged at #ThisIsACoup – and won.

Check out my latest blog post for the LSE Impact blog:

Online activism is a frequently debated topic amongst journalists and researchers alike. What effect can a popular Twitter hashtag really have in achieving political or social change? Wasim Ahmed looks in depth at last year’s heavily tweeted #ThisIsACoup hashtag. While concrete outcomes may still be indeterminate, it is clear social media is now a rich space for activism, expressions of solidarity and information sharing.

It has received mainstream media attention, and is among the most read blog post this week. It was recently mentioned on the Information School’s blog.

You can read it in full here.

An analysis of #ThisIsACoup

This is another blog post which seeks to analyse a viral hashtag, in this instance #ThisIsACoup, with data via Visibrain Focus (via Twitter’s Firehose API which in theory all of the tweets) for the month of July. This particular hashtag is of interest because recently, I received a message from a journalist Paul Mason, economic editor at Channel 4 news asked whether he could use a heat-map I created when the hashtag was trending.

The hashtag organizers have written a statement to why they felt the need to start the hashtag, below is a short extract from their statement:

We decided to support Francesca’s call to launch an online campaign to support the democratic will of the Greek people in the face of extortion by the Eurogroup in its negotiations with Syriza,” the statement continued. “The scandalous Eurogroup proposals yesterday made last night the ideal moment to create a hashtag to express and, above all, coordinate, our outrage at the extortion the Greek government and its people were being subject to.” (quoted in the Guardian Article #ThisIsACoup: how a hashtag born in Barcelona spread across globe).

Twitter, which boasts 316 million monthly active users with 500 million tweets per day, offers a route to raising awareness of world-wide events. In my area of research there are many health campaigns around the world which generate vast amounts of tweets, for example #WorldAutismAwarenessDay, and #WorldSucidePreventionDay, and often these hashtags start to trend, and are visible to other Twitter users.

There is evidence to suggest that social media played a role during the Arab Spring. Increasingly, trending hashtags are reported within the mainstream media, and can drive the news, and reach an even wider audience. This is especially the case with #ThisIsACoup, as the hashtag received wide coverage in the media suggesting it reached a wider audience then just Twitter users. Below are a series of figures that analyse #ThisIsACoup using several different methods:

Figure 1 – Time series graph of #ThisIsACoup (11 July to 17 July)

As the figure above displays, at the peak of the trending hashtag there were 270,391 tweets sent and received by Twitter users. Overall in the month of July there were 604,822 tweets sent and received by 140,794 users, 1,108,729,094 impressions (the number of times users saw the tweets), 158,847 tweets were original (26% of tweets), and 445,975 tweets (74% of tweets) were retweets, indicating that that this hashtag had a high retweet frequency.

Figure 2 – World map of #ThisIsACoup (01 July to 31 July)

The figure above displays the location of users tweeting with the #ThisIsACoup hashtag, where the location information is taken from users who provide a valid country location in their biography. The majority of tweets (24.8%) derived from Greece, the United Kingdom (15%), Spain (10.7%), the United States (9.6%), France (5.8%), Germany (5.4%), Italy (3.6%), Ireland (3.5%) Canada (2.8%), and the Netherlands (2.5%).

Figure 3 – Top Domains used in #ThisIsACoup (01 July to 31 July)

The figure above displays the top domains linked within users tweets. Twitter, Facebook, YouTube are all among top domains used within tweets indicating that users were linking to these platforms with relevant material. Users were also linking to news stories related to #ThisIsACoup as the Guardian is a top domain, alongside a blog/opinion page by Paul Krugman, whom comments on economics and politics for the New York Times.

Figure 3 – Top hashtags alongside #ThisIsACoup (01 July to 31 July)

The figure above displays the most frequently used hashtags (in the month of July) alongside #ThisIsACoup, were: #greece, #boycottgermany, #grexit, #oxi, #greekment, #greececrisis, and #germany

Figure 4 – Top expressions of #ThisIsACoup (01 July to 31 July)

The figure above displays the most frequently occurring expressions as taken from users tweets. The term ThisIsACoup is the most frequently occurring, followed by Greece, Democracy, Europe, and German among others.

Figure 5- Network graph of the first two hours of #ThisIsACoup

The most influential Twitter users alongside the top tweets in the month of July derived from influential politicians and journalists. The network graph above is of the first two hours of #ThisIsACoup. The atsipras account (center of the graph) belongs to the the current Prime Minster of Greece, and is particularly central in the network graph. The Guardian suggests that the a tweet from Ada Colau drove the hashtag to become viral. However, I would argue that the collective nature of the hashtag i..e, a number of Twitter users all tweeting at once caused the hashtag to become viral. As well as the high retweet percentage (74%) associated with this hashtag, a point that was highlighted in the discussion of Figure 1. Also, check out the NodeXL analysis of the hashtag which I tweeted out when the hashtag was trending.

Acknowledgments

This blog post was a collaborative effort, so there are a few people to thank, (in chronological order):

A massive thank you to Alexandra Boutopoulou, a very talented Masters student whom alerted me to this hashtag back in July, 2015.

A big thanks to Paul Mason, economics editor from Channel 4, for re-invigorating my interest in this hashtag, and for covering the hashtag so well as the events unfolded.

A massive thank you to Visibrain Focus, for providing access to the data via Twitter’s Firehose API, and a shout-out to the lovely Georgina Parsons whom has provided excellent user-support.

A final thank you to John Swain, head of Data Science at Yang Brothers, for creating the network graph in Figure 5.

You can find out more about Visibrain Focus here.

Challenges of using Twitter as a data source: an overview of current resources

In one of my previous blog posts I outlined a number of software applications that could be used to capture and analyse data from Twitter. In this blog post I outline some of the methodological, ethical, privacy, and copyright issues associated with using Twitter as a data source.

Twitter can be used as a source of data for social science research both current and historical in-of-itself, but it can also be used to compliment more traditional data sources such as surveys and interviews. Twitter boasts 316 million monthly active users with 500 million tweets per day. Marc Smith, from the Social Media Research Foundation, at The Next Web conference (2014) notes that although the city squares and plazas of the world are still important, now, more and more people are tweeting and posting about events.

Obtaining Twitter data need not require any advanced programming or computer science skills (see my blog post on software applications that can be used for this purpose). However, there are often specific challenges to using social media data in academic research, and in particular Twitter data, which social scientists may face for the very first time. Below is a list of some of the challenges that may be faced when using Twitter as a data source in academic research along with links to resources that provide advice and guidance on these issues:

Ethical issues, in collecting and retrieving data to form large datasets it may not be possible to obtain informed consent from all of the participants, simply due to the volume of tweets retrieved. There are also ethical issues if you decide to reproduce tweets in an academic publication, which have to be handled with care especially concerning tweets related to sensitive topics i.e., obtaining consent before disclosing user IDs or tweets. See NatCen’s report on user’s views of research using social media.

Legal issues, sharing of datasets is prohibited under Twitter’s API Terms of Service, however, researchers can share the tweet identification numbers, associated with each tweet, which can be used by other researchers to obtain Twitter datasets. If, for any reason, it is not possible to share tweet IDs then sharing the keywords and retrieval time of the data, may allow researchers to obtain a similar dataset. There may also be specific requirements for producing tweets within a publication i.e., following Twitter’s guidelines. See Twitter’s API Terms of Service.

Retrieving datasets, use of certain keywords or hashtags may not retrieve all of the data related to a topic. It may help that when brainstorming search queries that as many queries as feasible as possible are selected, and that this dataset is filtered for non-relevant keywords after data-retrieval. This is because missing certain keywords or hashtags could introduce a systematic bias which would lead to a biased sample. See the Demos and Ipos MORI reporton representivity. Datasets are also likely to be limited by the language that is used to retrieve data, for example, using the English keyword Ebola to retrieve data related to the Ebola epidemic will not gather data from other countries tweeting about Ebola which may use a different keyword i.e., a different language.

Cost, Twitter data costs a lot of money, and if it has not been possible to retrieve or set up a system to retrieve Twitter data within 7 days of a topic of interest, then it becomes difficult to obtain the data. This is because using the free API ecosystem it is only possible to retrieve Twitter data going back in time 7 days. However, it is be possible to obtain this data using a licensed re-seller of Twitter data. Historical Twitter data can range from not that expensive, to very expensive depending on both the query and time of retrieval. It is possible to generate free estimates for the cost of Twitter data using Sifter.

Representivity, Twitter users are not representative of the national offline population, Twitter users are not even representative of Internet users, and most strikingly Twitter datais not representative of Twitter users. This is because not all Twitter users will tweet on a topic of interest, for example, during the Ebola epidemic of last year not all Twitter users would post a tweet related to Ebola. It is also important to remember that it is not always individuals that may be tweeting but also, organizations, and those in a non-personal capacity, for instance journalists. Moreover, as the Demos and Ipos MORI report notes, the data that Twitter produces does not reflect Twitter users, as often a small number of vocal accounts account for a significant proportion of any given dataset. See research by the Pew Internet Research related to the demographic of Twitter users.

Spam, there is a large amount of link-baiting in popular hashtags (i.e., tweets designed for the users to click to be taken to a non-relevant website), and popular topics on Twitter can attract a large amount of spam. It may even be difficult to ascertain whether a user is realor a fictitious. Often fictitious accounts are set up either to (artificially) increase other users followers (celebrities, or politicians), but are also sold in retweet or favourite packages to fane popularity – where a large amount of users will retweet or favourite a user in large amounts. The extent to which Twitter contains fake accounts, retweets, and favourites is not known exactly, but the fact that these packages are available for cheap and can be found via a Google search suggests that they are popular among users.

The unknown, there are most likely methodological issues around using social media data, in particular Twitter data, within research that at this time are not known. Therefore, caution should be urged when drawing inferences from Twitter data in-and-within itself in this emerging field. Follow updates on NatCen’s New Social Media New Social Science (NSMNSS) blog, via their hashtag #NSMNSS, and my research blog.

Resource mentioned in the text above

Association of Internet Researchers (AoIR) link here

COSMOS Online Guide to Social Media Research and Ethics link here

New Social Media New Social Science (NSMNSS blog) link here

Pew Research Centre link here

Research using Social Media; Users’ Views link here

Sifter (free estimate generation for Twitter data) link here

The road to representivity a Demos and Ipsos MORI report on sociological research using Twitter link here

Twitters API Terms of Service link here

Unlocking the value of social media – a review of research ethics link here

Wasim Ahmed, a blog about my research link here

Twitter data capture tools from a usability perspective

In a blog post comment I was asked what tools are good from a usability and interface perspective. And I thought this would make for a good blog post. The tools covered in this blog were recommended to me by my PhD supervisor. Many of these tools have existing guides, videos or instructional tutorials and rather than provide my own I have provided the links to these.

Users of these tools are reminded that the data obtained via the tools should be used in a fair and responsible manner. And this means adhering to Twitter’s Rules of the Road as well as applicable ethical codes of practice and data protection laws.

TAGS (Twitter Arching Google Spreadsheet)

System: TAGS is a Web based tool so it will work on most operating systems.

Download TAGS: https://tags.hawksey.info/get-tags/

TAGS Support Forums: https://tags.hawksey.info/forums/

Mozdeh

System: Mozdeh only works on Windows and it is advisable to use a Desktop computer (there are 32 and 64 bit versions).

Download Mozdeh: http://mozdeh.wlv.ac.uk/installation.html

Mozdeh User Guide: http://mozdeh.wlv.ac.uk/resources/MozdehManual.docx

Mozdeh Theoretical overview: http://mozdeh.wlv.ac.uk/resources/TwitterTimeSeriesAndSentimentAnalysis.pdf

Twitter query set generation with Mozdeh: http://mozdeh.wlv.ac.uk/TwitterQuerySetGeneration.html

Chorus

System: Chorus only runs on Windows. It is also advisable to use Chorus with a desktop computer.

Request to download Chorus: http://chorusanalytics.co.uk/chorus/request_download.php

Chorus Tweetcatcher Desktop manual: http://chorusanalytics.co.uk/manuals/Chorus-TCD_usermanual.pdf

YouTube tutorial: https://www.youtube.com/watch?v=KmCrmiBOOvw

I made another list a while back ‘A list of tools to capture Twitter data’ at: https://wasimahmed1.wordpress.com/2015/01/30/a-list-of-tools-to-capture-twitter-data/

Also be sure to check out via Dr Deen Freelon’s curated list at: https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqDOiQ/edit

You can catch me on Twitter @was3210

Using Twitter to gain an insight into public views and opinions for the Ebola epidemic

The World Health Organisation writes that Ebola, a haemorrhagic fever, is a very severe and fatal illness with an average fatality rate of 50%. The first outbreak of Ebola occurred in 1976. The first case of Ebola, outside of West Africa, was reported in the U.S on September 19^th 2014. The current Ebola outbreak has taken more lives and infected more people than all the other outbreaks combined. And Twitter provides a platform for people to express their views and opinions on Ebola.

Chew and Eysenbach, for example, used Twitter to monitor the mentions of Swine Flu during the 2009 pandemic. They found that Twitter provided health authorities with the potential to become aware of the concerns, which were raised by the public. Similarly, Szomszor, Kostkova, and Louis examined Swine Flu on Twitter and found that Twitter offers the ability to sample large populations for health sentiment (public views and opinions). Signorini, Segre, and Polgreen also found that by using Twitter it was possible to understand user’s interests and concerns during the Swine Flu outbreak.

In 2010, Chew and Eysenbach wrote that Swine Flu was the first global pandemic which had occurred in the age of Web 2.0, and argued that this was a unique opportunity to investigate the role of technology for public health. Fast forward to the current outbreak of Ebola, this is the first time a global outbreak of Ebola has occurred in the age of Web 2.0.
And as the number of Twitter users has increased since 2010, there is the possibility to examine the recent Ebola outbreak on a larger scale.

In relation to the Ebola outbreak on Twitter. A study by Oluwafemi, Elia and Rolf published last year examined misinformation for Ebola on Twitter. This study found that the most common types of misinformation on Ebola were, that ingesting a plant ‘Ewedu’, blood transfusions, or drinking salt water could cure Ebola. Another study by Jin et al, which was published last year, found that there were conspiracy theories, innuendos, and rumours on Twitter related to Ebola. Jin et al looked at the time period between late September to late October (2014). Among some of the rumours reported, was that the Ebola vaccine only worked on white people, that Ebola patients had risen from the dead, and that terrorists would contract Ebola and spread it around the world.

Therefore, Twitter has the potential to provide insight into public views and opinions related to the Ebola outbreak, which would allow health authorities to become aware of the public concerns. Furthermore, by examining the rumours related to Ebola health authorities will be able to dispel false information via new or existing health campaigns.

In the next post I will examine the language dynamics of tweets related to Ebola.

Acknowledgements

I would like to thank Jennifer Salter, from the health informatics research group, for reading and providing extremely valuable feedback on an earlier version of this blog post.

References

Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. PLOS ONE, 5(11).

Fang Jin; Wei Wang; Liang Zhao; Dougherty, E.; Yang Cao; Chang-Tien Lu; Ramakrishnan, N., “Misinformation Propagation in the Age of Twitter,” Computer , vol.47, no.12, pp.90,94, Dec. 2014
doi: 10.1109/MC.2014.361

Signorini A, Segre AM, Polgreen PM. (2011) The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS ONE 6(5): e19467. doi:10.1371/journal.pone.0019467

Szomszor, M., Kostkova, P., & St Louis, C. (2011). Twitter informatics: Tracking and understanding public reaction during the 2009 Swine Flu pandemic. In Proceedings – 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 (Vol. 1, pp. 320–323). doi:10.1109/WI-IAT.2011.311

WHO. (2015). WHO | Ebola virus disease. [ONLINE] Available at: http://www.who.int/mediacentre/factsheets/fs103/en/ [Last accessed 20/01/2015].

Oyeyemi Sunday Oluwafemi, Gabarron Elia, Wynn Rolf. Ebola, Twitter, and misinformation: a dangerous combination? BMJ 2014; 349 :g6178

A list of tools to capture Twitter data

A list of tools that I have used to capture data from Twitter and which worked:

TAGS: http://tags.hawksey.info/

Mozdeh: http://mozdeh.wlv.ac.uk/

Chorus: http://chorusanalytics.co.uk/

Netlytic: https://netlytic.org

Facepager: http://www.ls1.ifkw.uni-muenchen.de/personen/wiss_ma/keyling_till/software.html

Twython at: https://github.com/ryanmcgrath/twython

KNIME: https://www.knime.org/ with the Palladian Extension (obtained via the app). Instructions on set up here: http://tech.knime.org/wiki/how-to-get-twitter-data-into-knime . Using the Twitter nodes from the extension menu provided by KNIME is much better. The instructions on setting this up are here : http://www.knime.org/blog/knime-twitter-nodes I could not figure out a way to extract the tweets.

NodeXL at: http://nodexl.codeplex.com/

Visibrain (Commercial): http://www.visibrain.com/en/

More tools:

Nvivo/Ncapture at: http://www.qsrinternational.com/products_nvivo_add-ons.aspx

TweetMapper at: http://tweetmapper.us

Twitonomy at: http://www.twitonomy.com

Webometrics at: http://lexiurl.wlv.ac.uk/index.html

Follow the Hashtag at: http://analytics.followthehashtag.com/#/

iScience Maps at: http://tweetminer.eu

More tools (require programming knowledge) from Deen Freelon’s curated Google Sheets template at: https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqDOiQ/edit it is a great list and I make sure to add to it:

DMI-TCAT at: https://github.com/digitalmethodsinitiative/dmi-tcat

yourTwapperKeeper at: https://github.com/540co/yourTwapperKeeper

140dev at: http://140dev.com/

Hosebird at: https://github.com/twitter/hbc

Pattern at: http://www.clips.ua.ac.be/pattern

poll.emic at: https://github.com/sbenthall/poll.emic

Social Feed Manager at: http://gwu-libraries.github.io/social-feed-manager/

SocialMediaMineR at: http://cran.r-project.org/web/packages/SocialMediaMineR/

streamR at: http://cran.r-project.org/web/packages/streamR/index.html

tStreamingArchiver at: https://github.com/brendam/tStreamingArchiver

twarc at: https://github.com/edsu/twarc

tweepy at: https://github.com/tweepy/tweepy