A list of tools that I have used to capture data from Twitter and which worked:
Twython at: https://github.com/ryanmcgrath/twython
with the Palladian Extension (obtained via the app). Instructions on set up here: http://tech.knime.org/wiki/how-to-get-twitter-data-into-knime . Using the Twitter nodes from the extension menu provided by KNIME is much better. The instructions on setting this up are here : http://www.knime.org/blog/knime-twitter-nodes I could not figure out a way to extract the tweets.
NodeXL at: http://nodexl.codeplex.com/
Visibrain (Commercial): http://www.visibrain.com/en/
Nvivo/Ncapture at: http://www.qsrinternational.com/products_nvivo_add-ons.aspx
TweetMapper at: http://tweetmapper.us
Twitonomy at: http://www.twitonomy.com
Webometrics at: http://lexiurl.wlv.ac.uk/index.html
Follow the Hashtag at: http://analytics.followthehashtag.com/#/
iScience Maps at: http://tweetminer.eu
More tools (require programming knowledge) from Deen Freelon’s curated Google Sheets template at: https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqDOiQ/edit it is a great list and I make sure to add to it:
DMI-TCAT at: https://github.com/digitalmethodsinitiative/dmi-tcat
yourTwapperKeeper at: https://github.com/540co/yourTwapperKeeper
140dev at: http://140dev.com/
Hosebird at: https://github.com/twitter/hbc
Pattern at: http://www.clips.ua.ac.be/pattern
poll.emic at: https://github.com/sbenthall/poll.emic
Social Feed Manager at: http://gwu-libraries.github.io/social-feed-manager/
SocialMediaMineR at: http://cran.r-project.org/web/packages/SocialMediaMineR/
tStreamingArchiver at: https://github.com/brendam/tStreamingArchiver
twarc at: https://github.com/edsu/twarc
tweepy at: https://github.com/tweepy/tweepy
Twitter-Tap at: https://github.com/janezkranjc/twitter-tap
Twitter Stream Downloader at: https://github.com/mdredze/twitter_stream_downloader
TWurl at: https://github.com/twitter/twurl
Be sure to check out my other list: ‘A list of tools to capture Twitter data’ at: https://wasimahmed1.wordpress.com/2015/01/30/a-list-of-tools-to-capture-twitter-data/
Also be sure to check out via Dr Deen Freelon’s curated list at: https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lYwctj6ek6ryqDOiQ/edit You can catch me on Twitter @was3210
My six month progress report is due in soon so I decided to do a blog post about some of the topics and issues I have encountered, and with which I am currently battling with. I am looking at pandemics and epidemics on Web 2.0. More recently, however, I have been investigating the Ebola epidemic, and I have been collecting Ebola related tweets.
Big data is a current buzzword within academia and is considered by some to be the new oil. However, keeping with the oil analogy, is it real oil or snake oil? This issue was chronicled by Simon Moss in a Wired article Big Data: New Oil or Snake Oil? Simon discusses the issue of normalising big data in an organisational sense. My issue is that of information quality, that is, the data is big, but, at times, it is of a poor quality. When the data is filtered it is not as big as it once was, and so it becomes little data. However, this small or little data is much more valuable in comparison to the larger set of data.
Ethical issues are ever present in social media research. The argument in favour of the utilisation of Web 2.0 for research is centred on the argument on whether the data is in the public domain. This raises questions on whether there is informed consent. Moreover, do Twitter users know that I am gathering this data? If I ask for consent for a tweet on Ebola that I captured in August would I even get a reply? There is a sense, as a Twitter user, that when you send a Tweet out that after a while it goes away. Thus, it is imperative that Twitter users are involved in the decision progress when discussing ethical issues. This was discussed at a conference I attended in November, Picturing the Social: Analysing Social Media Images.
I recently viewed a talk by Farida Vis which formed a part of the digital culture conference, improving reality. A very well-articulated example of the human influence on an algorithm was provided by Farida. This was of an advert on Facebook which promoted an assisted reproduction program, with a picture of a baby. Farida argues that this reflects how those who programmed the algorithm understand gender normative issues. That is, those who wrote the code held a schema whereby they believed a women of a certain age should have children. More recently, on Twitter I witnessed an advert that was advertising a laptop with the caption ‘Costs less than what you spend on Pizza last year’ which resulted in livid responses e.g. ‘Twitter what are you trying to say?’ This advert could have been targeted at all users, so this may not be the best example of a targeted algorithm. A further example is that of adverts for educational courses from Facebook, before I started university. This leads to a question of how much influence social media has on young adults. There is scope here, also, to examine how websites such as Amazon create suggestions. How does their algorithm work? And where does the human schemas fit in to this.
When talking about methods there is a tendency to select either a quantitative or qualitative research philosophy. However, in regards to social media research using a mixed method approach will yield richer results. That is, a method of analysis such as network analysis should be complemented with content analysis. If we limit ourselves to a particular research philosophy we will learn less from the data. So, I hope to employ a range of methods in analysing my own data. A related issue around methods, is that of the cost of big data. Big data is certainly out of reach for most academics and this is further exacerbated by stringent terms and conditions which restrict data sharing. The issue of whether the data is available for free, or whether there is a tool to obtain the data is also shaping the platforms I look at.
In my dataset of tweets images occur with great frequency and are often represented as a block of web links when scrolling down a spreadsheet. When I start to filter the dataset should I remove these links? An observation of big data is that it is associated with words and not images. However, in regards to images on Twitter; I would argue that they form a larger network of big data. According to one estimate, there are 250 million images shared on Twitter daily. However, these are overlooked in the majority of Twitter research. That is, during the 2009/2010 epidemic of H1N1, and the various subsequent outbreaks, images must have been shared on Twitter. The images would have formed an integral part of how a person may subsequently think about outbreaks. However, there was no evidence based research examining these images. Comparing images from different time points allows us to see whether narratives told via images remain the same or whether these change.
In text references:
The Wired News Article I mentioned can be found here: http://www.wired.com/2014/10/big-data-new-oil-or-snake-oil/
The talk by Farida Vis on algorithmic culture I mentioned can be found here: https://www.youtube.com/watch?v=WBXddqzIZTA
[Edited on 26/01/15]