challenges of tracking topical discussion networks online [ICA2010]

I’m currently in Singapore, having spent the last few days at the now-concluded International Communication Association conference for 2010. As well as going to various interesting presentations covering a wide range of processes, subjects, and disciplines (including such topics as the uses of Twitter while watching television programmes and the anatomy of YouTube memes), I also prepared a short presentation on some of the network mapping I’ve been doing recently, using data collected by Lars Kirchhoff and Thomas Nicolai of Sociomantic Labs. The final paper authored by the three of us, ‘Challenges of tracking topical discussion networks online’ will be available later, but for the moment here are the slides used yesterday morning at 8.30 (and, for more explanation, Axel Bruns was liveblogging both this session and the rest of the conference too):

[For details of the other presentation I was involved with, ‘Mapping the Australian Networked Public Sphere’ (Axel Bruns, Jean Burgess, Tim Highfield, Lars Kirchhoff, and Thomas Nicolai), Axel has the slides online here]

several dots on a map

2010 is already looking like it’ll be fairly busy, not least because nearly a quarter of it is gone already. Over the next twelve months, I should finish my thesis, while other projects are also being developed and carried out: I’m tutoring in a first-year unit this semester, and am currently writing up new work on the French political blog research, first outlined at IR10 last year, for both my thesis and a conference presentation.

That presentation will be in June, at the International Communication Association conference in Singapore, as a paper co-authored with Lars Kirchhoff and Thomas Nicolai from Sociomantic Labs in Germany. Where my IR10 presentation looked at the text content of blog posts, this paper will be covering the links being made, in their various guises.

As part of this work, and indeed in preparation for research into topical networks, the links made around particular events or themes, I’ve been busy looking into the more permanent/static networks created by blogroll links from sites in the sample population. As with the IR10 work, I’m using data collected by Thomas Nicolai and Lars Kirchhoff over the first eight months of 2009, with 217 political blogs, media resources, and other related websites represented in the final collected data. For this stage, I’ve taken these sites as a starting point, making a list of each blogroll out-link from each of the 217 sites as a two-column spreadsheet (host site, site linked to), and then importing the final list into Gephi for visualisation purposes.

[Because I was using a slightly older version of Gephi, I was also converting the spreadsheet into Pajek’s .net format in order to import it into Gephi using Excel 2 Pajek. However, the latest version of Gephi imports .csv, with extra import options through the .gdf format too]

Having not used Gephi before (I couldn’t get it to work when I tested out visualisation options quite a long time ago), my success in testing it out was greatly aided by the Gephi team releasing a step-by-step tutorial for new users. Importing every individual link originating from the 217 sites and following each tutorial step led to something that looks rather spectacular, although doesn’t really say much:

here comes sciencey

Of course, the risk with visualisation is that too much attention is spent on the ‘pretty’ side of things, or on preparing diagrams that look impressive (or ‘sciencey’), but don’t aid the research’s argument (or even confuse it further). While the initial aim of creating a blogroll network is to help me see the groups of sites that associate with each other, trying to get a handle on how these sites in the sample relate to each other, the warnings and advice from people such as Bernie Hogan at last year’s OII Summer Doctoral Programme have stayed in the back of my mind. As such, I’ve spent a fair amount of time over the last few weeks trying to clean up the data and improve the visualisations, not from an aesthetic point of view, but so I get a clearer sense of what I’m trying to describe.

here comes sciencey (part two)

With the full list of links containing over 5000 nodes, receiving at least one in-link from one of the 217 initial sites, one of the main problems in the first visualisation is the sheer number of nodes, and the implied overimportance of sites with many out-links (especially when these sites are the only ones linking to many nodes – it leads to large groups of satellites around nodes). The next step then, as seen above, was to restrict the nodes to those sites receiving two or more in-links from the initial 217 sites. A number of loose groupings were immediately apparent (see, for example, the top-left of the diagram), and these were followed up after the next round of cleaning the data:

here comes sciencey (5b)

here comes sciencey (part five)

In the first of these two visualisations, some nodes are coloured by their affiliation to particular political parties (either by being official sites or by containing the party name/acronym in their URL). A loose grouping of sites from the Front National (brown) and UMP (blue) in particular is apparent. In the second visualisation, I located sites that were members of three different blog communities or networks, organised around different themes or beliefs. Again, there is some loose grouping – unsurprising, considering this is a blogroll-oriented network, and often sites will have links either to the main page of the group or the other members in their blogrolls – but what is most interesting is the general location of the anti-Sarkozy group Les vigilants (in pink) between the left-wing and centrist party groupings (in the first of the two visualisations). For more details and visualisations-in-progress, check out my Flickr (and look out for updates on the related paper over the next few months!). The next important step, particularly in terms of new information, is comparing the blogroll links to the topical networks, and seeing whether the same associations are in play regardless of time or topic – this will be investigated further over the next few weeks. At this stage, in particular because of its ease of use (and not being restricted to the latest version of operating system-specific software, I’ll most likely continue to work with Gephi while I work on my thesis. I’d still like to try out Prefuse though at some point, but that may have to wait until after all this work is out of the way…

Over the Australian summer, I’m working from the Scholars’ Centre in the UWA Reid Library in Perth, a few desks away from where I wrote the bulk of my honours thesis, hopefully writing and finishing various things that have been in the works for some time. While in Perth, as well as working on the phd, I have the opportunity to see what’s changed both in the city and here on campus (I spent five years here as an undergrad and a staff member before heading east to QUT), and to explore a little.

One major development at UWA that opened earlier this year is the new Science Library. Combining collections previously housed in separate buildings for Maths, Physical Sciences, Biology, or the Arts & Humanities library in the case of Geology, the library is an extension of the previous Physical Sciences library, but also a complete refit of the building. It is a really impressive construction and renovation project, with what looks like a decent balance between collaborative/social spaces on the ground floor with quiet and private zones on the upper levels. Of course, it is currently the summer recess, so the number of students using the new library at the moment is far fewer than would be in the middle of semester, but from the brief period I spent wandering the library yesterday, it certainly appears the very model of how new libraries should be designed. And I’m clearly not alone in this thinking:

Granted, many of the features on display are not new to other libraries or campuses – QUT do the displays of available computers rather well, for example – but it’s still pleasing when a new development turns out right. Or at least appears that way… One of the nice touches is the artwork found at the end of each shelf: a biographical poster of a scientist, with the words coloured to form a portrait of the scientist as well. Examples and a UWA news article can be found here , and of course it wouldn’t be UWA if one of the posters didn’t depict Barry Marshall.

Exploring the new Science Library reminded me of a project I came across on my trip around the US in October, which had slipped my mind after my return. While being shown around Seattle, I was introduced to the Seattle Central Library, where, behind the main librarian’s desk, is located a visualisation entitled ‘Making Visible the Invisible‘ (the image above is from George Legrady’s site, as I wasn’t able to take my own photo). The work of George Legrady, Andreas Schlegel, August Black, Mark Zifchock and Rama Hoetzlein, the visualisation is in four parts, providing different representations of data around title, keyword, format, and Dewey Decimal call number. The visualisation is also dynamic, presenting items that were recently checked out from the library system. It’s not a new project, being unveiled in 2005 and having been the subject of a post at VisualComplexity, but it’s a great example of informative, data-oriented visualisation in public spaces (and wouldn’t look out of place in other libraries). The previous links should provide more information on the project itself; as a bonus, Legrady also has a very nice visualisation as an overview to the Dewey Decimal System, showing each section and (presumably) the number of items the Seattle Public Library system holds/held in that section. Given that one of the other, non-work-related projects I’m involved in uses the Dewey Decimal system, it’s of particular interest to me, but the approaches and use of dynamic data are noteworthy too.

into the eurosphere

with 2 comments

I’m still behind on all my RSS feeds after October, so rather than try and catch up, here’s something new(er). Over the weekend just gone, the Personal Democracy Forum – Europe (PDFEU/#pdfeu if you want to trawl the twitter archives) was held in Barcelona. Having only found out about it on Friday evening Brisbane time, as it was getting underway in Spain, I wasn’t attending the conference itself, but through the wonders of live streaming (run by Civico and containing audio, twitter, and CoverItLive live blogging), I was able to listen to the first few sessions on Friday. [The other sessions from Friday and Saturday are archived on the site at the moment if there’s anything that looks interesting]

There were several interesting discussions and topics, some of which were unfortunately missed due to sleep needs or being break-out sessions not streamed live, although information on those might be available on the live streaming site now. However, the most immediately impressive presentation coming out of PDFEU (certainly given my research interests) was that by Anthony Hamelle and Clémence Lerondeau of linkfluence (leaders in social network mapping and mentioned here several times previously). In their presentation, they unveiled a new linkfluence project, moving beyond their previous studies of French/U.S. political blogs or (French language discussion of) European topics on the internet. Instead, the latest study (visualisation below) looks at the ‘Eurosphere’ – blogs and websites run by commentators, parties, think tanks, activists, journalists, and so on, from France, the Netherlands, Germany, and Italy (the analysis also features a Europe affairs-specific cluster, drawing from all four nations). For specific information, I’d recommend going through the presentation itself (with audio available from the PDFEU streaming site), and also the accompanying linkfluence blog post. There’s more information to come, obviously, but a few findings are already particularly interesting: first, the varying bridging/gatekeeping population found in the different national spheres (the French having the most bridging bloggers), and indeed the very presence and function of bridge bloggers (Ethan Zuckerman has written about this subject previously, although not for as specific a context as European (political) topics). The comparative lack of interaction between national spheres is also interesting (bridging happening more between the EU-specific cluster and the national spheres), language could possibly be a factor, although the greater tendency of a particular group (Euro-sceptics and anti-federalists) to engage in conversations across the boundaries of the national spheres makes this finding a particularly fascinating topic for future research (well, maybe)!

There will be more coming out of this project from linkfluence, as the final slide shows, but the teaser material unveiled at pdfeu – and the topical case study used in the presentation, looking at the EU Presidency as a discussion topic over the previous month – suggests that the scope of this study will provide some interesting information on discussions and interactions at an international level:

Eurosphere (2009) by linkfluence

[Also, from a purely aesthetic perspective, how great (and clean) does the visualisation itself look?]

well I’ll be Bertied: Perth as meme

A first attempt at an experiment, and not a particularly rigorous one at that, in tracking information flows through Twitter.

On Monday afternoon (31 August), Australian-time, a new YouTube video was publicised*. There’s nothing particularly unusual in that, except that this particular video concerned Perth. The capital city of Western Australia, Perth is both extremely isolated and not always seen as the most exciting of places – being often scathingly referred to using terms such as ‘Dullsville’. So, when a three-minute video mocking aspects of Perth life and making up other information (possibly qualifying as what John Hartley describes as silly citizenship, but that’s for another time), hit YouTube, it quickly spread through Twitter, Facebook, and into the blogosphere, as Perth locals and expats (of which I am one) became aware of it.

Before going further, this is the video, made by Vincenzo Perrella and Dan Osborn and entitled This is Perth:

So, this gently mocking, amusing video was made, people watched it, told their friends. This can be tracked anecdotally; my personal experience of the video started at around 5pm Brisbane-time (all times from now on will be Brisbane time, despite this concerning Perth data – what I grabbed from Twitter was in my local time, and I did not want to overcomplicate things by starting to change times, especially since I was manually collecting the data. For Perth time, subtract two hours from Brisbane time), when Tama re-tweeted the link to the video. At this point, the RT was at least three steps down the line from its source, and the video itself was at around 350 views. Within a couple of hours, it had appeared three times on my friend feed in Facebook, within 24 hours it was up to 9000 views on YouTube, in 48 was well worth 35,000, and was at over 48,000 views at the time of writing. Links were also appearing in friends’ blog posts, and as the video spread, the media coverage grew too**. However, this isn’t the most precise or admissible form of measuring what had happened.

The most visible signs of people noticing the video and telling other people, at least from Brisbane, were through the likes of Twitter and Facebook. Searching Facebook for data was not the most successful of tasks, and indeed the variety of privacy settings can make finding content such as posted links hard to locate. Casually browsing livejournal posts and using blog search engines provided more results, but the re-tweeting activity on Twitter was the most immediately enticing option – it may be advantageous to return to the blogs and grab that data too, for comparison, but for now the only data source is Twitter.

The data set covering ‘This is Perth’-related tweets was obtained through multiple searches of Twitter, repeated over a couple of days to track new tweets. Without being as inclusive as possible, these searches attempted to locate as many tweets made between 31 August and 3 September linking to the video, discussing it or the articles on the West and PerthNow already covering it. Search terms included ‘This is Perth’, #thisisperth, and the various bit.ly and tinyurl addresses linking to the video, while further tweets were found by following the RT trails. The advantage of Twitter as opposed to Facebook was the prevalence of publicly accessible tweets; where locked posts were found, they were not included in the sample. However, if an RT included a user who had locked posts, the user was still included in the network created to show, where possible, the Twitter users acting as source nodes and hubs.

After the latest round of searches, carried out at 2pm Brisbane time, 227 tweets had been collected, not including those made by bots***. These had been made by, or took material from, 201 Twitter users. Of these users, 149 had specified a unique location, or made it apparent in their tweets – unsurprisingly, the majority of posts from which location could be determined came from Perth (92 tweets), with Sydney (16) and Melbourne (12) the next highest contributing cities. Outside of Australia, only nine tweets were from users declaring they were located internationally, with content being posted from the US, UK, Singapore, Canada, the Netherlands, and Malaysia. Such behaviour may be because of the localised nature of the video – for example, without knowing anything about Perth, the video may not be entertaining or interesting. Similarly, for people in or from Perth, seeing a video sending up their town may have meant some kind of connection with the video, and subsequently meant that it was passed on to friends, sharing the joke.


While geographically the mentions of the video were centred on Perth, time-wise the four hours after the video was first tweeted saw the highest activity; the earliest mention found in these searches was at 3.55pm on 31 August, with 25 additional tweets by 5pm and 41 between 5pm and 6pm. These coincided with the novelty of the video, spreading it when there was a good chance other people hadn’t seen it, and also with the end of the working day in Perth (peaking between 3pm and 4pm Perth-time). The WA-dominance of the coverage can be seen in the graph above. The graph depicts the number of tweets in hourly blocks, with the periods of little or no activity correspond with early hours of the morning, while the small increases in posting on Tuesday are during the work day and, in particular, the 7pm – 10pm period – however, these periods still contain less than 10 tweets an hour relating to the video. [The graph does not feature the last tweets from Wednesday night, when A Current Affair had a story on the video, as the exact time posted could not be determined, being in the format around 16 hours ago]


While the video hits continued to increase over the period covered here, Twitter coverage died down quickly, with occasional flurries of re-tweets as people who had not seen it earlier discovered it and passed it on. However, the longest chains of re-tweets occured in the first hours of the Twitter activity. The network visualisation above shows each Twitter user (excluding bots) featured in the sample as a node. The visualisation uses directed edges – the connections are not necessarily reciprocal links between users, but show a one-way link from one source user to a second user who may have either directly replied to a tweet or re-tweeted the work of the first user. Many nodes are not connected to others, having posted once and not been re-tweeted or not discussing it further with other users (at least, in a way that the particular searches used here would have found). There are also several small groups of two or three nodes, showing one user responding to or re-tweeting the post of another user. Most notably, there is a large, connected system of nodes in the middle of the visualisation, and for the most part these are connections that were made, or build from those made, in the first few hours of the Twitter coverage.


This closer look at the visualisation shows several paths for information flows, originating at a few source nodes. The longest paths contain nine nodes – starting at SixThousand, the Perth edition of a national network of subcultural e-newsletters and guides, re-tweets flow through people connected with The West Australian, and eventually crossed the country, reaching, for example, Fake Stephen Conroy, a popular Australian user satirising the Federal Communications Minister. To get to the end of these longest paths only took three hours from when SixThousand posted the first link – and by that point the number of tweets per hour covering the video was already declining.

The point of this exercise was not to claim anything about the nature of interpersonal communication using Twitter, or in Perth, or anything of that nature. For one thing, the data set is far too small to make any conclusions about information flows, while not looking at other data from additional sources such as Facebook or blogs means that a wider overview of the spread of the This is Perth video is lacking. Similarly, private communication such as email (the primary way I personally told friends about it) is not represented here. The main aim, instead, was to examine how to mine data from Twitter and what to do with it. The work here is a useful starting point for carrying out larger processes, ideally using automated tools such as NodeXL. One particular aspect I would have liked to cover here, and may do so later, is a comparison of the main connected group in the visualisation above and the actual followers of these users, whether what is depicted above shows information crossing groups or whether there is a high degree of interlinking amongst a group of friends.

In the meantime, what is shown is a short-lived burst of activity surrounding an amusing video about Perth, that quickly spread amongst a number of people either from or with connections to Perth, and then became a less prominent topic. While some coverage, such as last night’s A Current Affair story, and discussion of the video has appeared since the peak buzz surrounding it, activity hit a definite peak very early on – possibly reaching saturation point amongst a small audience? – and as the video itself has continued to gain hits, there just might not be any need to keep publicising it…

The network visualisation was made using GUESS, the graph through ManyEyes

* And possibly uploaded; the video’s page says 30 August, as opposed to 31, but there may be time difference issues.
** For example, stories posted on PerthNow and the West online, radio coverage on Nova 93.7, and a story on A Current Affair.
*** This may be a point of contention, as bots may be seen as further publicising the content and making it visible to more users, but for this initial work they have been excluded as the chain of re-tweets ended with them.

more links from the tubes

A few things from around the traps that have come up recently (and have been noted elsewhere already!)

1. the 3rd International Conference on Weblogs and Social Media happened a few weeks ago in San Jose, California – going from the papers from last year and the provision of a dataset for people to use before submitting papers for this year’s conference, there may well be some interesting new work coming out of the proceedings. May try and get over to Washington D.C. for next year’s conference.

2. Sciences-Po in Paris unveiled their Medialab with presentations by Richard Rogers (govcom/issuecrawler), Yochai Benkler, the gephi team, and the webatlas team – with the rtgi group based out at Compiegne, north of Paris in Picardie, there’s a couple of exciting projects and labs taking shape in France at the moment.

3. Meanwhile, over at the Berman Center at Harvard, the I&D team have launched an interactive version of the Iranian blogosphere map documented in a paper released early last year. Haven’t had much time to test it out yet, but given the other international projects happening over that way at the moment and the linkfluence/rtgi-type projects, this kind of interactive, rather than static, output may become more common in blog and internet network analysis and mapping.

4. Speaking of maps and internet networks, there’s been a bit of coverage of the new map of social (network) dominance over at techcrunch. Obviously, the general dominance, in western countries at least, of facebook over allcomers is a major talking point, but it’s also worth comparing the map to that produced two years ago. Again, facebook’s spread is particularly evident, but whereas in 2007 myspace still had a majority, of whatever margin, of dominance in such countries as Australia, the US, Italy, and Greece, facebook usurping it in all four of those countries, as well as taking over most of western Europe and claiming a large chunk of Africa, leaves myspace’s sole outpost in 2009 as… Guam? The move of facebook into many languages has also meant that the previously language-specific clusters – such as skyblog’s control of francophone nations – is eroded. There’s more to be taken from both maps, and I haven’t looked at any of the numbers involved here – both maps use data from Alexa, but as noted in the Techcrunch post there’s some debate as to whether myspace or facebook are the leading social network in the US. However, I’ll leave it on one final, pleasing point – that the 2009 map, being zoomable and able to select and customise views, has been produced using ManyEyes (mentioned here many times previously).

linkfluence visualise the French blogosphere (or bits of it) (twice!)

with 3 comments

Previously mentioned on several occasions, linkfluence/rtgi, who are leading the way in not just visualising maps of online networks but also giving several levels of information and scalabity, have in the last week or so released two visualisations for different sites. Last year, of course, they produced PresidentialWatch08 for the US Presidential election, and in 2007 had Observatoire Presidentielle for the French equivalent. Now come two new maps, one blog-centric and the other providing a more topical view of website connections.


First is the Wikiopole, for Wikio (a search and ranking site, who have also been developing tools for researchers, including their Backlink Factory). Depicting the connections between the top 1500 ranked blogs, and with sites coded based on their category (political, science, sport, etc), the map provides another overview of the state of the French blogosphere, this time in May 2009 (and may be useful if a map comes out every month/several months – in which case, archiving each edition would be rather handy). It’s also good to see visualisations not just looking at the political side of things (not that’s necessarily a bad thing, but there are plenty of ways to subdivide networks of blogs). Plus, as an overall blogosphere study, there’s scope to compare the statistical layout of the linkfluence map to the personal work from ouinon.net in 2007, despite the long period between the production of the particular maps.


The second map is for touteleurope.fr, looking at 2046 sites (not just blogs) discussing Europe(an politics) in French. There’s quite a bit of cross-over, understandably, between this map and the Observatoire Presidentielle, although it’s less concerned with the different political ideologies present and the types of site and separating the analysts from the ‘militants’, for example.

I’m on a rather slow internet connection at the moment (and unfortunately the two maps take a while to load for me), and still waiting for some information before looking further at the two maps – a lengthier write-up will come, but for the moment any new work in the French blogosphere, political or not, and in network studies and visualisations (even with reservations about methods or outputs, as the case may be) is welcome.

