…and then the world

“where nothing we’ve actually seen has been mapped or outlined…”

Archive for the ‘methodology’ Category

challenges of tracking topical discussion networks online [ICA2010]

with one comment

I’m currently in Singapore, having spent the last few days at the now-concluded International Communication Association conference for 2010. As well as going to various interesting presentations covering a wide range of processes, subjects, and disciplines (including such topics as the uses of Twitter while watching television programmes and the anatomy of YouTube memes), I also prepared a short presentation on some of the network mapping I’ve been doing recently, using data collected by Lars Kirchhoff and Thomas Nicolai of Sociomantic Labs. The final paper authored by the three of us, ‘Challenges of tracking topical discussion networks online’ will be available later, but for the moment here are the slides used yesterday morning at 8.30 (and, for more explanation, Axel Bruns was liveblogging both this session and the rest of the conference too):

[For details of the other presentation I was involved with, ‘Mapping the Australian Networked Public Sphere’ (Axel Bruns, Jean Burgess, Tim Highfield, Lars Kirchhoff, and Thomas Nicolai), Axel has the slides online here]

Advertisements

Written by Tim

26 June, 2010 at 4:08 pm

well I’ll be Bertied: Perth as meme

with 4 comments

A first attempt at an experiment, and not a particularly rigorous one at that, in tracking information flows through Twitter.

On Monday afternoon (31 August), Australian-time, a new YouTube video was publicised*. There’s nothing particularly unusual in that, except that this particular video concerned Perth. The capital city of Western Australia, Perth is both extremely isolated and not always seen as the most exciting of places – being often scathingly referred to using terms such as ‘Dullsville’. So, when a three-minute video mocking aspects of Perth life and making up other information (possibly qualifying as what John Hartley describes as silly citizenship, but that’s for another time), hit YouTube, it quickly spread through Twitter, Facebook, and into the blogosphere, as Perth locals and expats (of which I am one) became aware of it.

Before going further, this is the video, made by Vincenzo Perrella and Dan Osborn and entitled This is Perth:

So, this gently mocking, amusing video was made, people watched it, told their friends. This can be tracked anecdotally; my personal experience of the video started at around 5pm Brisbane-time (all times from now on will be Brisbane time, despite this concerning Perth data – what I grabbed from Twitter was in my local time, and I did not want to overcomplicate things by starting to change times, especially since I was manually collecting the data. For Perth time, subtract two hours from Brisbane time), when Tama re-tweeted the link to the video. At this point, the RT was at least three steps down the line from its source, and the video itself was at around 350 views. Within a couple of hours, it had appeared three times on my friend feed in Facebook, within 24 hours it was up to 9000 views on YouTube, in 48 was well worth 35,000, and was at over 48,000 views at the time of writing. Links were also appearing in friends’ blog posts, and as the video spread, the media coverage grew too**. However, this isn’t the most precise or admissible form of measuring what had happened.

The most visible signs of people noticing the video and telling other people, at least from Brisbane, were through the likes of Twitter and Facebook. Searching Facebook for data was not the most successful of tasks, and indeed the variety of privacy settings can make finding content such as posted links hard to locate. Casually browsing livejournal posts and using blog search engines provided more results, but the re-tweeting activity on Twitter was the most immediately enticing option – it may be advantageous to return to the blogs and grab that data too, for comparison, but for now the only data source is Twitter.

The data set covering ‘This is Perth’-related tweets was obtained through multiple searches of Twitter, repeated over a couple of days to track new tweets. Without being as inclusive as possible, these searches attempted to locate as many tweets made between 31 August and 3 September linking to the video, discussing it or the articles on the West and PerthNow already covering it. Search terms included ‘This is Perth’, #thisisperth, and the various bit.ly and tinyurl addresses linking to the video, while further tweets were found by following the RT trails. The advantage of Twitter as opposed to Facebook was the prevalence of publicly accessible tweets; where locked posts were found, they were not included in the sample. However, if an RT included a user who had locked posts, the user was still included in the network created to show, where possible, the Twitter users acting as source nodes and hubs.

After the latest round of searches, carried out at 2pm Brisbane time, 227 tweets had been collected, not including those made by bots***. These had been made by, or took material from, 201 Twitter users. Of these users, 149 had specified a unique location, or made it apparent in their tweets – unsurprisingly, the majority of posts from which location could be determined came from Perth (92 tweets), with Sydney (16) and Melbourne (12) the next highest contributing cities. Outside of Australia, only nine tweets were from users declaring they were located internationally, with content being posted from the US, UK, Singapore, Canada, the Netherlands, and Malaysia. Such behaviour may be because of the localised nature of the video – for example, without knowing anything about Perth, the video may not be entertaining or interesting. Similarly, for people in or from Perth, seeing a video sending up their town may have meant some kind of connection with the video, and subsequently meant that it was passed on to friends, sharing the joke.

tiptweets

While geographically the mentions of the video were centred on Perth, time-wise the four hours after the video was first tweeted saw the highest activity; the earliest mention found in these searches was at 3.55pm on 31 August, with 25 additional tweets by 5pm and 41 between 5pm and 6pm. These coincided with the novelty of the video, spreading it when there was a good chance other people hadn’t seen it, and also with the end of the working day in Perth (peaking between 3pm and 4pm Perth-time). The WA-dominance of the coverage can be seen in the graph above. The graph depicts the number of tweets in hourly blocks, with the periods of little or no activity correspond with early hours of the morning, while the small increases in posting on Tuesday are during the work day and, in particular, the 7pm – 10pm period – however, these periods still contain less than 10 tweets an hour relating to the video. [The graph does not feature the last tweets from Wednesday night, when A Current Affair had a story on the video, as the exact time posted could not be determined, being in the format around 16 hours ago]

fullnetwork

While the video hits continued to increase over the period covered here, Twitter coverage died down quickly, with occasional flurries of re-tweets as people who had not seen it earlier discovered it and passed it on. However, the longest chains of re-tweets occured in the first hours of the Twitter activity. The network visualisation above shows each Twitter user (excluding bots) featured in the sample as a node. The visualisation uses directed edges – the connections are not necessarily reciprocal links between users, but show a one-way link from one source user to a second user who may have either directly replied to a tweet or re-tweeted the work of the first user. Many nodes are not connected to others, having posted once and not been re-tweeted or not discussing it further with other users (at least, in a way that the particular searches used here would have found). There are also several small groups of two or three nodes, showing one user responding to or re-tweeting the post of another user. Most notably, there is a large, connected system of nodes in the middle of the visualisation, and for the most part these are connections that were made, or build from those made, in the first few hours of the Twitter coverage.

main_network_nolabels

This closer look at the visualisation shows several paths for information flows, originating at a few source nodes. The longest paths contain nine nodes – starting at SixThousand, the Perth edition of a national network of subcultural e-newsletters and guides, re-tweets flow through people connected with The West Australian, and eventually crossed the country, reaching, for example, Fake Stephen Conroy, a popular Australian user satirising the Federal Communications Minister. To get to the end of these longest paths only took three hours from when SixThousand posted the first link – and by that point the number of tweets per hour covering the video was already declining.

The point of this exercise was not to claim anything about the nature of interpersonal communication using Twitter, or in Perth, or anything of that nature. For one thing, the data set is far too small to make any conclusions about information flows, while not looking at other data from additional sources such as Facebook or blogs means that a wider overview of the spread of the This is Perth video is lacking. Similarly, private communication such as email (the primary way I personally told friends about it) is not represented here. The main aim, instead, was to examine how to mine data from Twitter and what to do with it. The work here is a useful starting point for carrying out larger processes, ideally using automated tools such as NodeXL. One particular aspect I would have liked to cover here, and may do so later, is a comparison of the main connected group in the visualisation above and the actual followers of these users, whether what is depicted above shows information crossing groups or whether there is a high degree of interlinking amongst a group of friends.

In the meantime, what is shown is a short-lived burst of activity surrounding an amusing video about Perth, that quickly spread amongst a number of people either from or with connections to Perth, and then became a less prominent topic. While some coverage, such as last night’s A Current Affair story, and discussion of the video has appeared since the peak buzz surrounding it, activity hit a definite peak very early on – possibly reaching saturation point amongst a small audience? – and as the video itself has continued to gain hits, there just might not be any need to keep publicising it…

The network visualisation was made using GUESS, the graph through ManyEyes

* And possibly uploaded; the video’s page says 30 August, as opposed to 31, but there may be time difference issues.
** For example, stories posted on PerthNow and the West online, radio coverage on Nova 93.7, and a story on A Current Affair.
*** This may be a point of contention, as bots may be seen as further publicising the content and making it visible to more users, but for this initial work they have been excluded as the chain of re-tweets ended with them.

Written by Tim

3 September, 2009 at 6:25 pm

what to do with blog posts: another test

leave a comment »

With my confirmation seminar next week (Tuesday to be precise, more details on that later), I’ve been thinking about what I’m trying to get out of this research project, which bits of the data to study, and how these might be represented within my thesis (and any other outcomes). Because I am quite possibly insane, over the last two days I’ve grabbed (manually) the full text of each blog post made on Pineapple Party Time – a blog hosted on Crikey and run by Mark Bahnisch of Larvatus Prodeo, William Bowe of the Poll Bludger, and Possum (Scott Steel) from Pollytics. I chose this blog mainly because it had a brief, and complete, lifespan – it ran for a month, being launched on Tuesday 24 February 2009, when the Queensland state election was called, until Monday 23 March (two days after the election itself, enough time for a few final analyses). Of course, that didn’t mean there were only a few posts, around 130 in total (having copied and pasted each one into its own document), of which Bahnisch contributed the most.

So, with all the posts in raw(ish) text format (except for the election day liveblog – see below), and not worrying about links or comments just yet (I didn’t save comments, but I’ll probably get some graphs happening comparing number of posts per day and comments per day, both for the whole blog history and per author), what should be done with this data? Well, textual/content analysis of some description, but something quick would be preferable for the moment. I’m going to run everything through Leximancer a bit later, but earlier in the week ManyEyes (featured here previously) added a new data visualisation tool to its range of options: phrase net. This method allows you to upload your data set of many words and find common combinations of phrases along the lines of ‘x is y’, ‘x’s y’, ‘x of the y’, ‘x and y’, and so on. So, in the name of research, I’ve been testing it out. Here’s the visualisation (currently of the ‘x is y’ format) for posts from the entire blog:

ppt [ManyEyes]

Given the general themes of the election coverage – Premier Anna Bligh calling it early, the LNP looking to gain a big swing of voters away from the ALP, polls being seen as giving the LNP a slim victory or making the contest too close to call – some of the combinations showing up are unsurprising (‘Labor is worried/scared/vulnerable’ for example).

Going on an author by author basis, though, this changes a little, given Possum and Bowe’s focus on, for example, poll analysis and electoral data. Read the rest of this entry »

Written by Tim

26 March, 2009 at 12:17 pm

the Olympics and the French (political) blogosphere: first crawls

with 2 comments

Following on from my previous post, the diagrams below are the results of my first topic-oriented experiments with IssueCrawler. As mentioned before, I set up nine crawls in total, three each looking for French blog posts concerning Tibet, censorship, and human rights. These crawls were queued at different times during the Olympic Games in Beijing – once during the first week of competition, once during the second week, and a few days after the closing ceremony.

After a flurry of activity on the IssueCrawler servers over the weekend, only one crawl of the nine remains to be completed. However, the delay in crawls does ask the question of whether the networks depicted would be different if the crawls happened on the same day, or within a few days, of being enqueued – of the first three crawls, the Tibet crawl (first to be enqueued) was started and completed on 19 August, censorship 19-20 August, and human rights 27-28 August. While the first two were complete before the second crawls were prepared, the third round of crawls had been enqueued before the first human rights crawl had even started. As such, there may be a risk that material not published, or written, on 14 August may influence the network of that crawl because of the delay, and the other human rights networks may end up looking rather similar. Of the latter concern, the third crawl is still to be completed so I’m not sure how close the fit will be, but the list of seed sites for each crawl was different. As this is still experimental, for these crawls I did not carry over the seed sites from one crawl to another or have a master list of sites. Instead, before each crawl I went through the top 100 results from Google Blog Search (French) using the search terms in the diagrams below, and manually included or rejected the posts based on their content and whether or not they were blog posts. I will attempt another topical experiment later this month, in which I will test out the including previous sites in the seed list, for comparison between methods.

Tibet
JO Pekin Tibet - 13/08/08

Censorship
JO Pekin censure - 13/08/08

Human Rights
JO Pekin "droits de l'homme" - 14/08/08

One other concern with these crawls, as opposed to the next two rounds, is that I didn’t keep the same settings for the crawls: censorship has three iterations in its crawl, as does human rights, but Tibet only has two and with a greater crawl depth than the other two crawls. As such, any comparison of these diagrams with the later results is affected by the variable settings used.

Some notable aspects of the diagrams:
The large cluster of blogs to the bottom-right of the human rights diagram. However, these are all blogs hosted by one service, 20 minutes, and although the page being linked to is a news article, as nearly all of the 17 links to the page come from 20 minutes blogs this cluster may not be as significant to the issue discussion as other parts of the diagram.

The censorship network has, as its largest node, Wikio.fr – in particular, the top 100 political blogs page, with 42 in-links. This may be a result of the three iterations of the crawl, following links from blogs (such as icons showing where the particular blog is in the rankings) to that page, and later crawls with only two iterations may reduce the presence of Wikio in the network.

More notes to come, but one other concern with IssueCrawler is the legend – the colours used for particular top-level domains (.com, .fr, etc.) vary between diagram. Although the legend is clearly noted on each diagram, it does mean that quick comparisons also have to be careful ones!

Written by Tim

1 September, 2008 at 3:50 pm

test: the Beijing Olympics and the French (political) blogosphere

with one comment

The Games of the 29th Olympiad finished in Beijing over the weekend, and although most of the coverage of the Games as it came to a conclusion focussed on the exploits of a certain M. Phelps, U. Bolt, and, if you’re in either Great Britain or Australia, the respective positions of those countries on the medal table*. And Stephanie Rice. However, in the lead-up to the Games, and especially surrounding the torch relay earlier this year, political topics were more frequent – keywords including Tibet, censorship, human rights… Although the sporting side of the Olympics seemed to dominate coverage over the past few weeks, rather than the political topics, as part of testing out the tools available for my research I’ve been using IssueCrawler to track politically-oriented discussions of the Beijing Olympics in French blogs. As my method is still rather experimental, I’m not sure what will turn up results-wise – but that’s part of the fun, and as the results come in I’ll post some more comments about the content and any issues with this approach.

Most of my crawls are still in the queue for IssueCrawler’s servers, so it may be a while before I get all the results, although I’m hoping for the final crawl in my first round of searches to come up soon. Basically, I used the French language version of Google Blogsearch to find blog entries containing a few keywords. Three separate searches were made, one for each keyword: the searches looked at the Olympics in Beijing and either Tibet, censorship, or human rights. The first 100 results were found, viewed, and those that were not politically-oriented or not a blog were discarded. The remaining URLs were harvested by IssueCrawler and used as the seed list for the crawl.

The searches took place three times, once during the first week of the Olympics, once during the second week, and finally two days after the closing ceremony, to collect any Olympic summaries or commentaries. While I’m still testing out the methods and tools, and making silly mistakes like not using the same settings for crawls, I’m hopeful that the exercise will be useful for both my methodology and also having more of an idea of what the French blogosphere is like.

Wikio.fr Top 100 Political Blogs (August 2008)

Speaking of which, the above is another test of IssueCrawler, this time using Wikio.fr‘s top 100 political blogs for August 2008 as the seeds. Again, I’ll go through this properly when I’ve tested the method a bit more thoroughly, but it’s interesting to compare the shape of the automated network to the manual one I made a few months ago (using the May rankings) in ManyEyes (I’ll go through the pros/cons of both methods in another post):

*Forgot to mention it during the Olympics themselves, but Jonathan Crowe of The Map Room was running DFL again, noting all the last-place finishes in Beijing…

Written by Tim

27 August, 2008 at 3:09 pm