I used Tweepy (a Python library), to pull in Twitter data in a raw JSON format. This is shown in Figure 1 below:
Figure 1 – A Tweet with all of the accompanying metadata
Code can then be written to extract i.e., to lift the tweet, time stamp, author, and tweet ID out of the raw JSON. This is shown in Figure 2 below:
Figure 2 – Filtered JSON
This reduction in metadata may be required as for hardware purposes it may not be feasible to maintain and or process all of the data. The tweets can then be saved to a CSV file, for example, to be opened in a spreadsheet.
I thought it would be interesting to compare what metadata existing tools can provide and to see how they have been programmed for metadata retention. It is important to note that in 2013 Twitter increased the amount of metadata that was returned, therefore if a tool was designed prior to this, the developers could have chosen to retain the same metadata or to not have updated the software at all.
Texifter a commercial provider of Twitter data (using the Firehose) supplies all of the metadata whereas free tools provide only certain amounts of metadata. TAGS provides the most metadata (TAGS can also provide further metadata ), and Mozdeh the least.
The figure above demonstrates the varying levels of data that tools can provide i..e, that decisions about what metadata is included or excluded has already been decided when it is time to use the tool.This may have implications for the types of research that can be conducted as, for example, it may not be possible to study the favoriting behaviour of users if a tool does not provide this data.