Echosec: Location-Based Social Media Search – Potential For Academic Research And Industry

Echosec, simply put, allows you to navigate to almost any location in the world and examine the social media activity around that vicinity.  Currently Echosec Pro allows users to access at least the following social data feeds:

  • Instagram
  • Twitter
  • Foursquare
  • Panoramio
  • AIS Shipping
  • Sina Weibo
  • Flickr
  • YouTube
  • VK

The Echosec platform provides enormous research potential as it is possible to select a specific geographical area and examine the social media activity around it.

cayzptxwwaaxw79

Echosec Dashboard Layout

Users can plot a rectangle, circle, or plot a custom shape almost anywhere in the world to display the social media activity around that area. Users can also use advance date filtering features to ensure only relevant posts are displayed.

Echnosec also has great potential in regards to business intelligence as it is possible to monitor chatter around a specific area. For instance, finding out that social media users in a particular area are complaining about the lack of particular store or product e.g., a coffee shop.

One of the biggest advantages of Echosec are that it is not based on a specific social media platform; it allows users to aggregate data from several popular social media networks.

In addition to location-based searching, it is also possible to search via keywords and examine where posts derive from. For instance, to find out whether users in certain geographical regions are mentioning a trending hashtag.

Echosec works by making use of location-based metadata to search for social media and other open source information. It relies mostly on a range of API requests directly to the social media networks (Twitter, Instagram, and Facebook etc), but also to third party information repositories.

Echosec is used by those within the Public Safety and Intelligence sector, the Corporate Security & Investigations sector, and within the Media and Journalism sector.

Used ethically and within the right hands, Echosec has great potential for public good. I also see it to have excellent potential for academic research projects.

Compared to some of the other social media analytics software out there, Echosec Pro is extremely affordable at only $89 per month annually. It’s definitely worth checking out. You can access the free version of Echosec here.

Disclaimer: No data was retrieved and/or analysed in the writing of this blog post.

Amplified messages: How hashtag activism and Twitter diplomacy converged at #ThisIsACoup – and won.

Check out my latest blog post for the LSE Impact blog:

Nice Graphic.png

Online activism is a frequently debated topic amongst journalists and researchers alike. What effect can a popular Twitter hashtag really have in achieving political or social change? Wasim Ahmed looks in depth at last year’s heavily tweeted #ThisIsACoup hashtag. While concrete outcomes may still be indeterminate, it is clear social media is now a rich space for activism, expressions of solidarity and information sharing.

It has received mainstream media attention, and is among the most read blog post this week. It was recently mentioned on the Information School’s blog. 

You can read it in full here.

My 2015 year in review for blogging

The WordPress.com stats helper monkeys prepared a 2015 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 8,300 times in 2015. If it were a concert at Sydney Opera House, it would take about 3 sold-out performances for that many people to see it.

Click here to see the complete report.

An analysis of #ThisIsACoup

This is another blog post which seeks to analyse a viral hashtag, in this instance #ThisIsACoup, with data via Visibrain Focus (via Twitter’s Firehose API which in theory all of the tweets) for the month of July. This particular hashtag is of interest because recently, I received a message from a journalist Paul Mason, economic editor at Channel 4 news asked whether he could use a heat-map I created when the hashtag was trending.

The hashtag organizers have written a statement to why they felt the need to start the hashtag, below is a short extract from their statement:

We decided to support Francesca’s call to launch an online campaign to support the democratic will of the Greek people in the face of extortion by the Eurogroup in its negotiations with Syriza,” the statement continued. “The scandalous Eurogroup proposals yesterday made last night the ideal moment to create a hashtag to express and, above all, coordinate, our outrage at the extortion the Greek government and its people were being subject to.” (quoted in the Guardian Article #ThisIsACoup: how a hashtag born in Barcelona spread across globe).

Twitter, which boasts 316 million monthly active users with 500 million tweets per day, offers a route to raising awareness of world-wide events. In my area of research there are many health campaigns around the world which generate vast amounts of tweets, for example #WorldAutismAwarenessDay, and #WorldSucidePreventionDay, and often these hashtags start to trend, and are visible to other Twitter users.

There is evidence to suggest that social media played a role during the Arab Spring. Increasingly, trending hashtags are reported within the mainstream media, and can drive the news, and reach an even wider audience. This is especially the case with #ThisIsACoup, as the hashtag received wide coverage in the media suggesting it reached a wider audience then just Twitter users. Below are a series of figures that analyse #ThisIsACoup using several different methods:

Figure 1 – Time series graph of #ThisIsACoup (11 July to 17 July)

ThisISACoupTimeLine

As the figure above displays, at the peak of the trending hashtag there were 270,391 tweets sent and received by Twitter users. Overall in the month of July there were 604,822 tweets sent and received by 140,794 users, 1,108,729,094 impressions (the number of times users saw the tweets), 158,847 tweets were original (26% of tweets), and 445,975 tweets (74% of tweets) were retweets, indicating that that this hashtag had a high retweet frequency.

Figure 2 – World map of #ThisIsACoup (01 July to 31 July)

ThisISaCoupMAP

The figure above displays  the location of users tweeting with the #ThisIsACoup hashtag, where the location information is taken from users who provide a valid country location in their biography. The majority of tweets (24.8%) derived from Greece, the United Kingdom (15%), Spain (10.7%), the United States (9.6%), France (5.8%), Germany (5.4%), Italy (3.6%), Ireland (3.5%) Canada (2.8%), and the Netherlands (2.5%).

Figure 3 – Top Domains used in #ThisIsACoup (01 July to 31 July)

ThisISaCoupdomains

The figure above displays the top domains linked within users tweets. Twitter, Facebook, YouTube are all among top domains used within tweets indicating that users were linking to these platforms with relevant material. Users were also linking to news stories related to #ThisIsACoup as the Guardian is a top domain, alongside a blog/opinion page by Paul Krugman, whom comments on economics and politics for the New York Times.

Figure 3 – Top hashtags alongside #ThisIsACoup (01 July to 31 July)

 ThisISACoupHashtags

The figure above displays the most frequently used hashtags (in the month of July) alongside #ThisIsACoup, were: #greece, #boycottgermany, #grexit, #oxi, #greekment, #greececrisis, and #germany

Figure 4 – Top expressions of #ThisIsACoup (01 July to 31 July)

ThisIsACoupExpressions

The figure above displays the most frequently occurring expressions as taken from users tweets. The term ThisIsACoup is the most frequently occurring, followed by Greece, Democracy, Europe, and German among others.

Figure 5- Network graph of the first two hours of #ThisIsACoup

gephi

The most influential Twitter users alongside the top tweets in the month of July derived from influential politicians and journalists. The network graph above is of the first two hours of #ThisIsACoup. The atsipras account (center of the graph) belongs to the the current Prime Minster of Greece, and is particularly central in the network graph. The Guardian suggests that the a tweet from Ada Colau drove the hashtag to become viral. However, I would argue that the collective nature of the hashtag i..e, a number of Twitter users all tweeting at once caused the hashtag to become viral. As well as the high retweet percentage (74%) associated with this hashtag, a point that was highlighted in the discussion of Figure 1. Also, check out the NodeXL analysis of the hashtag which I tweeted out when the hashtag was trending.

Acknowledgments

This blog post was a collaborative effort, so there are a few people to thank, (in chronological order):

A massive thank you to Alexandra Boutopoulou, a very talented Masters student whom alerted me to this hashtag back in July, 2015.

A big thanks to Paul Mason, economics editor from Channel 4, for re-invigorating my interest in this hashtag, and for covering the hashtag so well as the events unfolded.

A massive thank you to Visibrain Focus, for providing access to the data via Twitter’s Firehose API, and a shout-out to the lovely Georgina Parsons whom has provided excellent user-support.

A final thank you to John Swain, head of Data Science at  Yang Brothers, for creating the network graph in Figure 5.

You can find out more about Visibrain Focus here.

Challenges of using Twitter as a data source: an overview of current resources

In one of my previous blog posts I outlined a number of software applications that could be used to capture and analyse data from Twitter. In this blog post I outline some of the methodological, ethical, privacy, and copyright issues associated with using Twitter as a data source.

Twitter can be used as a source of data for social science research both current and historical in-of-itself, but it can also be used to compliment more traditional data sources such as surveys and interviews. Twitter boasts 316 million monthly active users with 500 million tweets per day. Marc Smith, from the Social Media Research Foundation, at The Next Web conference (2014) notes that although the city squares and plazas of the world are still important, now, more and more people are tweeting and posting about events.

Obtaining Twitter data need not require any advanced programming or computer science skills (see my blog post on software applications that can be used for this purpose). However, there are often specific challenges to using social media data in academic research, and in particular Twitter data, which social scientists may face for the very first time. Below is a list of some of the challenges that may be faced when using Twitter as a data source in academic research along with links to resources that provide advice and guidance on these issues:

  • Ethical issues, in collecting and retrieving data to form large datasets it may not be possible to obtain informed consent from all of the participants, simply due to the volume of tweets retrieved. There are also ethical issues if you decide to reproduce tweets in an academic publication, which have to be handled with care especially concerning tweets related to sensitive topics i.e., obtaining consent before disclosing user IDs or tweets. See NatCen’s report on user’s views of research using social media.
  • Legal issues, sharing of datasets is prohibited under Twitter’s API Terms of Service, however, researchers can share the tweet identification numbers, associated with each tweet, which can be used by other researchers to obtain Twitter datasets. If, for any reason, it is not possible to share tweet IDs then sharing the keywords and retrieval time of the data, may allow researchers to obtain a similar dataset. There may also be specific requirements for producing tweets within a publication i.e., following Twitter’s guidelines. See Twitter’s API Terms of Service.
  • Retrieving datasets, use of certain keywords or hashtags may not retrieve all of the data related to a topic. It may help that when brainstorming search queries that as many queries as feasible as possible are selected, and that this dataset is filtered for non-relevant keywords after data-retrieval. This is because missing certain keywords or hashtags could introduce a systematic bias which would lead to a biased sample. See the Demos and Ipos MORI reporton representivity. Datasets are also likely to be limited by the language that is used to retrieve data, for example, using the English keyword Ebola to retrieve data related to the Ebola epidemic will not gather data from other countries tweeting about Ebola which may use a different keyword i.e., a different language.
  • Cost, Twitter data costs a lot of money, and if it has not been possible to retrieve or set up a system to retrieve Twitter data within 7 days of a topic of interest, then it becomes difficult to obtain the data. This is because using the free API ecosystem it is only possible to retrieve Twitter data going back in time 7 days. However, it is be possible to obtain this data using a licensed re-seller of Twitter data. Historical Twitter data can range from not that expensive, to very expensive depending on both the query and time of retrieval. It is possible to generate free estimates for the cost of Twitter data using Sifter.
  • Representivity, Twitter users are not representative of the national offline population, Twitter users are not even representative of Internet users, and most strikingly Twitter datais not representative of Twitter users. This is because not all Twitter users will tweet on a topic of interest, for example, during the Ebola epidemic of last year not all Twitter users would post a tweet related to Ebola. It is also important to remember that it is not always individuals that may be tweeting but also, organizations, and those in a non-personal capacity, for instance journalists. Moreover, as the Demos and Ipos MORI report notes, the data that Twitter produces does not reflect Twitter users, as often a small number of vocal accounts account for a significant proportion of any given dataset. See research by the Pew Internet Research related to the demographic of Twitter users.
  • Spam, there is a large amount of link-baiting in popular hashtags (i.e., tweets designed for the users to click to be taken to a non-relevant website), and popular topics on Twitter can attract a large amount of spam. It may even be difficult to ascertain whether a user is realor a fictitious. Often fictitious accounts are set up either to (artificially) increase other users followers (celebrities, or politicians), but are also sold in retweet or favourite packages to fane popularity – where a large amount of users will retweet or favourite a user in large amounts. The extent to which Twitter contains fake accounts, retweets, and favourites is not known exactly, but the fact that these packages are available for cheap and can be found via a Google search suggests that they are popular among users.
  • The unknown, there are most likely methodological issues around using social media data, in particular Twitter data, within research that at this time are not known. Therefore, caution should be urged when drawing inferences from Twitter data in-and-within itself in this emerging field. Follow updates on NatCen’s New Social Media New Social Science (NSMNSS) blog, via their hashtag #NSMNSS, and my research blog.
Resource mentioned in the text above
Association of Internet Researchers (AoIR) link here
COSMOS Online Guide to Social Media Research and Ethics link here
New Social Media New Social Science (NSMNSS blog) link here
Pew Research Centre link here
Research using Social Media; Users’ Views link here
Sifter (free estimate generation for Twitter data) link here
The road to representivity a Demos and Ipsos MORI report on sociological research using Twitter link here
Twitters API Terms of Service link here
Unlocking the value of social media – a review of research ethics link here
Wasim Ahmed, a blog about my research link here

Using Visibrain Focus to analyse #ILookLikeASuregon

In the previous blog post I examined the unrest in Ferguson using a commercial tool, Visibrain Focus. In this blog post I will outline some Twitter analytics related to the #ILookLikeASuregon hashtag using Visibrain Focus which has access to Twitter’s Firehose i.e., all of the tweets.The results presented here are accurate at the time of writing, and these are from 12 to 1PM UK GMT time on the 27th of August 2015. The network graphs however are over two specific time periods, of the first day of the hashtag, and the of the latest day (27th of August).

The #ILookLikeASuregon hashtag attempts to challenge gender stereotypes, and was inspired by the #ILookLikeAnEngineer hashtag. Both of these hashtags attempt to break the male stereotype that can be associated with these two professions. There is now also a #ILookLikeAPhysicist hashtag which attempts to break the male stereotype that can be associated with the field of physics. The hashtags have received quite a lot of media attention, and you can read BBC Trending’s write up of the #ILookLikeASuregon hashtag here.

Over the past week I have managed to speak to quite a lot of the Twitter users behind the hashtag, even finding myself in a 6-way conference call with Surgeons across continents. There was a lot of passion and excitement as I could tell that this was a hashtag that meant a lot to them. I also had the opportunity to interview Heather Logghe, MD whom provided some insight into how the hashtag came about.

In total, at the time of writing, over the last 30 days there have been at least 28,337 tweets by 6,005 users, with 89,936,713 impressions i.e., the number of times users have seen the tweets. 5,878 (22%) of the tweets are original, 22,459 are retweets (79%). This retweet percentage is quite high. The users behind the campaign have indicated that they look to retweet any mentions of the hashtag, this may be one reason for why there is a high retweet ratio. Also interesting here is that 23,532 links have been shared.

Figure 1 – Timeline of tweets related to the #ILookLikeASuregon hashtag

timeline

Figure 1 is a time series graph going back in time 30 days from the date this blog post was written. As mentioned previously, the awareness campaign began on the 5th of August by Heather Logghe, MD. The largest peak occurred on the 12th of August where at least 2,221 tweets were posted.

Figure 2 – World Map of tweets related to the #ILookLikeASuregon hashtag

map of the world

Figure 2 is a map that plots user locations related to #ILookLikeASuregon using data provided within a users bio. However, this map only displays instances of the keyword where users have used the English language hashtag #ILookLikeASuregon, rather than say a European or Asian alternative.

Figure 3 – Word cloud of related hashtags used in conjunction with the #ILookLikeASuregon hashtag

looklikeasurgeonwordcloud

Figure 3 is a word cloud of hashtags that are present within the tweets. The most frequently used hashtags alongside #ILookLikeASuregon include, #surgtweeting, #diversitymatters, and #challengestereotypes. Many of the hashtags are related to challenging gender stereotypes, which is not surprising considering the aim of the campaign.

Figure 4 – Top expressions within tweets related to the #ILookLikeASuregon hashtagtop expressions

Figure 4 displays the most frequently used expressions within each of the tweets. The interesting expressions within this word cloud include the phrase awareness about women, diversity in surgery, women surgeonsfemale surgeon, and the phrase not me in heels. 

Figure 5 – Most frequently mentioned users in tweets related to #ILookLikeASuregon hashtag

top mentions

Figure 5 displays the most frequently mentioned user-handles which include @WomenSurgeons, @LoggheMD, and @DrKathy whom are among the users that helped raise the profile of the campaign.

Below are two network graphs, the first corresponds to one of the first days the hashtag was used (06th of August 2015), and the second network graph is of a more recent day (27th of August 2015). Both network graphs represent data retrieved from Twitter’s Firehose.

Figure 6 – Network graph of #ILookLikeASuregon from 06 Aug 2015 00:00 to 06 Aug 2015 23:00

day1

This is a network graph created in Gephi using data obtained from Visibrain Focus from 06 Aug 2015 00:00 to 06 Aug 2015 23:00 and the nodes are ranked by the betweenness centrality algorithm using the Fruchterman Reingold layout. Verbal consent was obtained from this community before the analysis was conducted.

Figure 7 – Network graph of #ILookLikeASuregon from 27 Aug 2015 00:00 to 27 Aug 2015 23:00

2

This is a network graph created in Gephi using data obtained from Visibrain Focus from 27 Aug 2015 00:00 to 28 Aug 2015 23:00 and the nodes are ranked by the betweenness centrality algorithm using the Fruchterman Reingold layout.

What figures 6 and 7 demonstrate is that compared to the very beginning (06 August) to fairly recently (27 August) there is an increase of users tweeting with the hashtag, demonstrating that the community has grown significantly.

Massive thank you to Heather Logghe, MD for letting me talk to her, and thanks also all of the other surgeons that were on the conference call I mentioned earlier in the post. Thanks also to Mimi Poinsett, MD for suggesting to analyse the #ILookLikeASuregon hashtag. Massive thanks as always to the lovely Georgina Parsons from Visibrain FocusThis is a link to the platform for anyone interested in seeing what it is all about.

Using Visibrain Focus to analyse the unrest in Ferguson

In my previous blog post I outlined a number of free tools that could be used to capture and analyse data from Twitter, in the next series of posts I will look at more powerful commercial tools. Over the past few weeks I have had the opportunity to use Visibrain Focus (commercial), which is a Twitter monitoring platform for digital marketing professions, however, it has several features which are useful for research purposes.

This blog post has two aims. Firstly, to show the potential of Visibrain Focus, and secondly to provide some Twitter insight related to the Ferguson unrest (using the ‘#Ferguson’ hashtag and the ‘Ferguson protests’ keyword). As I have the unique opportunity to access tweets from the Firehose API (i.e., all of the tweets), I hope it can also help those which are currently conducting research around these themes.

Over the last 30 days (i.e., 30 days going back from 22nd August 2015, 3.07PM, GMT), in total there are 1,715,534 tweets by 500,252 users. There are 13,337,415,455 impressions (that is to say the amount of users have seen the tweets). The tweets are 36% original (n=618,772), 64% are retweets (n=1,096,762), and 74% of tweets contain a link (n=1,269,006). The retweet percentage is of interest here, indicating that tweets related to the Ferguson unrest have a high retweet ratio.

Figure 1 – Timeline of tweets containing the keywords ‘#Ferguson’ or ‘Ferguson protests’ 

timeline1

As shown in the figure above tweets start to increase on August 9th which corresponds to the one year anniversary of the fatal shooting of Michael Brown by a white police officer. The largest peak occurs on the 10th of August where a total of 550,928 tweets are posted. There is a sharp increase as during this time period, police in Ferguson, Missouri, shot and critically injured an African-American teenager

Figure 2 – Most frequently occurring hashtags used in tweets related to the unrest in Ferguson 

fergurson word cloud

In regards to the top three hashtags, #Ferguson is used 803,860 times, #blacklivesmatter is used 70,393 times, and #mikebrown is used 52,823 times. However, it is important to note that in order to retrieve this dataset the hashtag #Ferguson and the keyword ‘Ferguson Protests’ were used. It may be better to state that the word cloud above represents the most frequently occurring co-hashtags.

Figure 3 – Most commonly used expressions in tweets related to the unrest in Ferguson 

expressions

The above word cloud is generated by using the most commonly reoccurring terms found in tweet content. In addition to the hashtags in the word cloud above (such as blacklivesmatter), other interesting expressions include ‘state of emergency’ ‘police’, ‘shots’ and ‘last year’. Also interesting here is the expression ‘Sir Alex Ferguson’ which is the ‘noise’ in our dataset.

Figure 4 – World map of tweets related related to the unrest in Ferguson 

worldmap

The figure above is a map of where users are tweeting from using the location provided within a user’s biography. The majority of tweets derive from the U.S. 69.3% (n=531,654), U.K. 5.3% (n=40,303), and Canada 2.7% (n=21,093). However, this is a distribution I have observed across topics on Twitter and may have more to do with overall use of Twitter, as well as access to the Internet, and mobile devices.

In regards to language, the majority of tweets are in English 84.2% (n=1,445,680), Spanish 4.2% (n=72,854), and German 4% (n=68,061). Taken with figure 4 above, this is not surprising as the majority of tweets derive from English-speaking countries.

Visibrain can also infer gender, in this instance, 22.2% of tweets derive from males (n=381,061), and 17.2% derive from females (n=296,120) with 60.6% (n=1,039,824) classified as other i.e., as it is not possible to infer gender. This may be because the name provided by a Twitter user is not a real name or it is in a format that can not be processed by Visibrain’s algorithm.

Figure 5 – Audience and following numbers of tweets related to the unrest in Ferguson 

audience

The figure above shows audience and following numbers of users that have tweeted about the unrest. The most interesting aspect is that users have an average of 7,617 followers, and 158,815 users have a following of over 158 thousand i.e., a high audience.

In terms of devices, 56.7% of users use a mobile (n=973,988), 2.6% (n=45,311) use a desktop, 1.8% (n=30,870) use a web related client, and 8.3% (n=141,812) use an automated method with 30.6% (n=525,024) classified as other.

The top 5 domains include twitter.com 54.6% (n=693,775)  youtube.com 2.8% (n=35,523)  nytimes.com (1.7%) (n=21,705)  theguardian.com 1.5% (n=19,267), and  cnn.com  1.22% (n=815,694). Many videos are shared on Twitter so it is not surprising to see YouTube as the second most popular domain. However, it is interesting, to see The New York Times, The Guardian, and CNN as popular domains.

The top 5 content types include, text 62.5% (n=793,983),  photo 46.1% (n=584,916), video 10% (126,789), and audio 0.2% (2,806). Image and video sharing are quite high, however text based tweets out number both photo and video sharing. Also of interest is that 1,273,716 tweets contain a link. 

Visibrain allows end-users to export mention data in Gexf format, the files can then be imported into a Gephi to create network graphs. I extracted a mention graph from 12AM to 1AM on August 9th (i.e., 1 hours worth of tweets) in order to create a network graph, shown below.

Figure 6 – Network graph of 1 hour of tweets related to the unrest in Ferguson on August 9th 2015 created in Gephi

1 screenshot_172842

Visibrain has many other features, for instance, it is also possible to look at most occurring tweets, most re-tweeted users and apply various filters to sort through users and tweets. I hope to tweet out the different features and types of analysis that is possible using Visibrain over the coming weeks.

Below, is a more recent network graph tweeted over 4PM and 5PM on the 22nd of August.

Figure 7- Network graph of 1 hour of tweets related to the unrest in Ferguson on August 22nd 2015 created in Gephi

Fscreenshot_202242

Special thanks goes to the lovely Georgina Parsons (@G_Parsons33 ) from Visibrain Focus, whom has provided excellent user support. Massive thanks also to Pierrette Mimi Poinsett, MD (@yayayarndiva) for providing the idea to examine the Ferguson unrest on Twitter. This is a link to the platform for anyone interested in seeing what it is all about.

Table of tested software that can gather data from Twitter without programming knowledge

This is a table of software that I have used and tested for my PhD research (so far) to either gather or analyse Twitter data. I use them all in combination as they complement each other very well. Some of the software can allow for data gathered from other platforms to be imported into the application so it is best to read the documentation thoroughly.

Tool OS Platforms
Mozdeh Windows (Desktop advisable) Twitter
Webometric Analyst Windows Twitter (+image extraction), YouTube, Flickr
Mendeley, & Other web resources
NodeXL Windows Twitter, YouTube, & Flicker
Netlytic Web based Twitter, Facebook, YouTube, & Instagram
Twitter Arching Google Spreadsheet (TAGS) Web based Twitter
Chorus Windows (Desktop advisable) Twitter
DiscoverText (free 30 day trial) Web based Twitter, Facebook, Blogs, Forums, & Online news platforms
COSMOS Project Windows
MAC OS X

Twitter

Visibrain Web based

Twitter

Did I leave something out? Let me know! Either in the comments section or via Twitter (@was3210).The table first appeared in an LSE impact blog post Using Twitter as a data source: An overview of current social media research tool
s

Using @NodeXL to analyse @foodgov and associated hashtags

In this blog post I want to analyse the Foods Standards Agency (FSA), specifically their Twitter handle @foodgov by producing a network graph and associated analytics using the very powerful Microsoft Excel plugin NodeXL. I then want to further analyse the top 5 hashtags by creating 5 further network graphs.

I selected the FSA as I had the opportunity to attend an event at Twitter HQ, London where the head of the Head of Information Management , Dr Sian Thomas provided some insight into the innovate and intuitive methods the FSA have applied. Both in using social media data, and as a method of reaching the public via allergy awareness campaigns and by use of influencers (that is to say, users who may have a bigger reach or a different type of user following compared to the FSA). My report on the event which provides more context to the work by the FSA can be found here.

In the network graphs, G1, G2, and G3 etc. refer to different groups of users and the words at the top of each group are those that occur most frequently. By visiting the NodeXL graph gallery more analytics can be located such as top URLs overall in the graph and in the separate groups (in this blog post I have hyperlinked each of the graphs i.e, by clicking on Network graph 1, for example, will take you to the graph gallery version of the network graph).

I find network graphs useful in summarizing and providing a snapshot of what users are conversating about on Twitter related to a keyword, hashtag, or user-handle at any given time. One topic i.e., bird flu may generate a range of conversations and this would be represented in the network graph with a number of different groups and associated keywords and URLs. For each graph I have added a section where I briefly mention what I found interesting about it.

 Network graph 1Tweets containing @foodgov
@foodgov

The graph above represents a network of 441 Twitter users whose recent tweets contained “@foodgov”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 10,000 tweets.

An interesting observation in this network graph: Top URLs such as: FSA advice about avian (bird) flu,  &  FSA Board agrees restrictions on raw milk should remain,  &  Suspected bird flu found on Lancashire poultry farm,  &  Campylobacter Action Plan – Our Progress,  &  J & K Smokery Ltd recalls vacuum packed smoked fish because of concerns over Clostridium botulinum controls 

What I am interested in this post is the top 5 hashtags in the entire graph and these were:

fsaboard
birdflu
rawmilk
recall
foodallergy

So, one by one, I entered these hashtags into NodeXL to create 5 further network graphs.

Network graph 2 – fsaboard

fsaboard

The graph represents a network of 100 Twitter users whose recent tweets contained “#fsaboard”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 10,000 tweets.

An interesting observation in this network graph: The @foodgov account was most influential in this network graph (ranked by betweenness centrality). The top URL in G1 and overall in the graph was: FSA Board agrees restrictions on raw milk should remain and one of the top keywords in this group was ‘raw milk’ indicating that discussion revolved around this news article.

Network graph 3birdflu

birdflu

The graph represents a network of 876 Twitter users whose recent tweets contained “#birdflu”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 10,000 tweets.

An interesting observation in this network graph: In G1 a number of Twitter users (that are not connected to each other) are relaying the message i.e., are posting a tweet that contains the keyword or hashtag ‘birdflu’. The top URL in the entire graph and G1 was Avian flu confirmed in Lancashire.

Network graph 4 – rawmilk

rawmilk

The graph represents a network of 175 Twitter users whose recent tweets contained “#rawmilk”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 10,000 tweets.

An interesting observation in this network graph: In G2 a number of unconnected users are relaying tweeting about a news article related to scientific risk assessments that were recently published in the Journal of Food Protection. Drawing on these results the author of the article suggests that raw milk is ‘remarkably’ safe. The top 3 URLs overall and the top URL in G2 is the aforementioned news story: New Science Confirms that Drinking Raw Milk is Remarkably Safe.

Network graph 5 – recall

recall

The graph represents a network of 1,777 Twitter users whose recent tweets contained “#recall”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 10,000 tweets.

An interesting observation in this network graph: The most influential Twitter account in the entire graph is @usdafoodsafety (ranked by betweenness centrality)In G1 a number of unconnected users are relaying messages i.e., tweeting about products (mostly  food) being recalled. The top URL in overall in the graph is a wordpress website which provides news and email alerts on product recalls (not always food products which explains the top keywords such as ‘gm’ and ‘India’ as the company General Motors recently had to recall a large number of vehicles due to a wiring problem).

Network graph 6 – foodallergy

foodallergy

The graph represents a network of 662 Twitter users whose recent tweets contained “#foodallergy”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 10,000 tweets.

An interesting observation in this network graph: The top hashtags in this graph, foodallergy, faact, and peanutallergy. The top co-words: food, allergy & peanut,patch, & foodallergy,friendly,& phase,iii, iii,trials, & trials,foodallergy. The most influential Twitter account @foodallergy. The top URL in the entire graph Peanut Patch’ Heads to Phase III Trials.

This blog post has presented some analytics on the @foodgov Twitter account and associated hashtags using NodeXL, there is much more going on within each graph and I have only highlighted what I found interesting, particularly from a health informatics perspective. Written consent was obtained (consent via a tweet) to analyse the FSA’s Twitter account and/or associated keywords and hashtags related to the FSA’s Twitter account.

For anyone wanting to learn more about NodeXL and network graphs check out this video – Network Mapping the Ecosystem by Marc Smith (@marc_smith) and this excellent article Mapping Twitter Topic Networks: From Polarized Crowds to Community Clusters.

Why is there so much research on Twitter? And what does this mean for our methods?

I was asked on Twitter by a fellow PhD student what tools and methods there were of capturing and analysing data from Facebook, and although I was able to find a few, there were far more Twitter data capture tools. I also noticed that there are very few tools that can be used to obtain data from other social media platforms such as, Pinterest, Goolge+, Tumblr, Instagram, Flickr, Vine, and Amazon among others. This led me to wonder whether it was tool availability, or some other reason for why there is more research on Twitter, compared to other social media platforms.

I then asked the following question on Twitter:

Why is there so much research on Twitter? Is it because it’s difficult to get data from other platforms? Or is Twitter a special platform?

I received a range of responses:

  1. Twitter is a popular platform in terms of the media attention it receives and therefore it attracts more research due to this cultural status
  2. Twitter makes it easier to find and follow conversations which consequently makes it easier to research
  3. Twitter has hashtag norms which make it easier gathering, sorting, and expanding searches when collecting data
  4. Twitter data is easy to retrieve as major incidents, news stories and events on Twitter are normally centered around a hashtag
  5. The Twitter API is more open and accessible compared to other social media platforms, which makes Twitter more favorable to developers creating tools to access data. This consequently increases the availability of tools to researchers.

It is probable that a combination of response 1 to 5 have led to more research on Twitter. However, this raises another distinct but closely related question: when research is focused so heavily on Twitter, what (if any) are the implications of this on our methods?

The methods that are currently used in analysing Twitter data i.e., sentiment analysis, time series analysis (examining peaks in tweets), network analysis etc., can these be applied to other platforms or are different tools, methods and techniques required?

I have used the following four methods in analysing Twitter data for the purposes of my PhD, below I consider whether these would work for other platforms:

  1. Sentiment analysis works well with Twitter data, as tweets are consistent in length (i.e., <= 140) would sentiment analysis work well with, for example Facebook data where posts may be longer?
  2. Time series analysis is normally used when examining tweets overtime to see when a peak of tweets may occur, would examining time stamps in Facebook posts, or Instagram posts, for example, produce the same results? Or is this only a viable method because of the real-time nature of Twitter data?
  3. Network analysis is used to visualize the connections between people and to better understand the structure of the conversation. Would this work as well on other platforms whereby users may not be connected to each other i.e., public Facebook pages, or images from Instagram?
  4. Machine learning methods may work well with Twitter data due to the length of tweets (i.e., <= 140) but would these work for longer posts and posts (i.e., Instagram) where images may be present?

It may well be that at least some of these methods can be applied to other platforms, however they may not be the best methods, and may require the formulation of new methods, techniques, and tools. On the tool front, I would like to see more software for those in the social sciences to obtain data for a range of platforms and including a range of data i.e., web links, images, and video. At the Masters and PhD level there should be more emphasis on training for social science students in effectively using existing software that can be used to capture data analyse data from social media platforms.

Acknowledgements

I would like to thank Curtis Jessop, Blog Editor of NSMNSS and Senior Researcher at NatCen Social Research, for the suggestion to write this blog post and the idea to examine the methodological implications of focusing on certain social media platforms.

%d bloggers like this: