“where nothing we’ve actually seen has been mapped or outlined…”

challenges of tracking topical discussion networks online [ICA2010]

I’m currently in Singapore, having spent the last few days at the now-concluded International Communication Association conference for 2010. As well as going to various interesting presentations covering a wide range of processes, subjects, and disciplines (including such topics as the uses of Twitter while watching television programmes and the anatomy of YouTube memes), I also prepared a short presentation on some of the network mapping I’ve been doing recently, using data collected by Lars Kirchhoff and Thomas Nicolai of Sociomantic Labs. The final paper authored by the three of us, ‘Challenges of tracking topical discussion networks online’ will be available later, but for the moment here are the slides used yesterday morning at 8.30 (and, for more explanation, Axel Bruns was liveblogging both this session and the rest of the conference too):

[For details of the other presentation I was involved with, ‘Mapping the Australian Networked Public Sphere’ (Axel Bruns, Jean Burgess, Tim Highfield, Lars Kirchhoff, and Thomas Nicolai), Axel has the slides online here]

several dots on a map

2010 is already looking like it’ll be fairly busy, not least because nearly a quarter of it is gone already. Over the next twelve months, I should finish my thesis, while other projects are also being developed and carried out: I’m tutoring in a first-year unit this semester, and am currently writing up new work on the French political blog research, first outlined at IR10 last year, for both my thesis and a conference presentation.

That presentation will be in June, at the International Communication Association conference in Singapore, as a paper co-authored with Lars Kirchhoff and Thomas Nicolai from Sociomantic Labs in Germany. Where my IR10 presentation looked at the text content of blog posts, this paper will be covering the links being made, in their various guises.

As part of this work, and indeed in preparation for research into topical networks, the links made around particular events or themes, I’ve been busy looking into the more permanent/static networks created by blogroll links from sites in the sample population. As with the IR10 work, I’m using data collected by Thomas Nicolai and Lars Kirchhoff over the first eight months of 2009, with 217 political blogs, media resources, and other related websites represented in the final collected data. For this stage, I’ve taken these sites as a starting point, making a list of each blogroll out-link from each of the 217 sites as a two-column spreadsheet (host site, site linked to), and then importing the final list into Gephi for visualisation purposes.

[Because I was using a slightly older version of Gephi, I was also converting the spreadsheet into Pajek’s .net format in order to import it into Gephi using Excel 2 Pajek. However, the latest version of Gephi imports .csv, with extra import options through the .gdf format too]

Having not used Gephi before (I couldn’t get it to work when I tested out visualisation options quite a long time ago), my success in testing it out was greatly aided by the Gephi team releasing a step-by-step tutorial for new users. Importing every individual link originating from the 217 sites and following each tutorial step led to something that looks rather spectacular, although doesn’t really say much:

here comes sciencey

Of course, the risk with visualisation is that too much attention is spent on the ‘pretty’ side of things, or on preparing diagrams that look impressive (or ‘sciencey’), but don’t aid the research’s argument (or even confuse it further). While the initial aim of creating a blogroll network is to help me see the groups of sites that associate with each other, trying to get a handle on how these sites in the sample relate to each other, the warnings and advice from people such as Bernie Hogan at last year’s OII Summer Doctoral Programme have stayed in the back of my mind. As such, I’ve spent a fair amount of time over the last few weeks trying to clean up the data and improve the visualisations, not from an aesthetic point of view, but so I get a clearer sense of what I’m trying to describe.

here comes sciencey (part two)

With the full list of links containing over 5000 nodes, receiving at least one in-link from one of the 217 initial sites, one of the main problems in the first visualisation is the sheer number of nodes, and the implied overimportance of sites with many out-links (especially when these sites are the only ones linking to many nodes – it leads to large groups of satellites around nodes). The next step then, as seen above, was to restrict the nodes to those sites receiving two or more in-links from the initial 217 sites. A number of loose groupings were immediately apparent (see, for example, the top-left of the diagram), and these were followed up after the next round of cleaning the data:

here comes sciencey (5b)

here comes sciencey (part five)

In the first of these two visualisations, some nodes are coloured by their affiliation to particular political parties (either by being official sites or by containing the party name/acronym in their URL). A loose grouping of sites from the Front National (brown) and UMP (blue) in particular is apparent. In the second visualisation, I located sites that were members of three different blog communities or networks, organised around different themes or beliefs. Again, there is some loose grouping – unsurprising, considering this is a blogroll-oriented network, and often sites will have links either to the main page of the group or the other members in their blogrolls – but what is most interesting is the general location of the anti-Sarkozy group Les vigilants (in pink) between the left-wing and centrist party groupings (in the first of the two visualisations). For more details and visualisations-in-progress, check out my Flickr (and look out for updates on the related paper over the next few months!). The next important step, particularly in terms of new information, is comparing the blogroll links to the topical networks, and seeing whether the same associations are in play regardless of time or topic – this will be investigated further over the next few weeks. At this stage, in particular because of its ease of use (and not being restricted to the latest version of operating system-specific software, I’ll most likely continue to work with Gephi while I work on my thesis. I’d still like to try out Prefuse though at some point, but that may have to wait until after all this work is out of the way…

new Berkman Center study on the Arabic blogosphere

Last year, the Internet & Democracy team at the Berkman Center at Harvard released a study of the Persian language blogosphere, a key work in larger-scale, non-English-language blogosphere research. (And of course, although I’m not going to discuss current events in Iran, the use of social media, twitter in particular, has been a subject of news reports recently (regardless of its relative importance, biases, etc.)). The I&D team have just released a new study, this time of the Arabic blogosphere – taking in Egyptian, Saudi, Kuwaiti, and Syrian blogs, and covering clusters and bridges in three languages (Arabic, English, French – although my treatment of ‘French’ in the current version of my research is looking at the country rather than the language, the use of French in northern Africa provides an additional, possible group of sites to study). I’ve just downloaded the pdf of the new report, so no comments on that yet, but if you want to have a look at it, it’s at the I&D site.

personal internet maps

So, I haven’t quite been on top of my rss feeds lately, and only found out today [via a new post at Serial Mapper] of Kevin Kelly‘s Internet Mapping Project. It’s a growing collection of hand-drawn maps of the internet, designed by all-comers. As Kelly explains, “I’ve become very curious about the maps people have in their minds when they enter the internet. So I’ve been asking people to draw me a map of the internet as they see it. That’s all.

The template pdf to draw your map on is here, which can then be emailed to Kelly, if you’re interested. Or you can check out what’s already been sent in at the posts above or at Kelly’s flickr. (I haven’t made one, having only just found out about it, but maybe in the next week or so…)

more links from the tubes

A few things from around the traps that have come up recently (and have been noted elsewhere already!)

1. the 3rd International Conference on Weblogs and Social Media happened a few weeks ago in San Jose, California – going from the papers from last year and the provision of a dataset for people to use before submitting papers for this year’s conference, there may well be some interesting new work coming out of the proceedings. May try and get over to Washington D.C. for next year’s conference.

2. Sciences-Po in Paris unveiled their Medialab with presentations by Richard Rogers (govcom/issuecrawler), Yochai Benkler, the gephi team, and the webatlas team – with the rtgi group based out at Compiegne, north of Paris in Picardie, there’s a couple of exciting projects and labs taking shape in France at the moment.

3. Meanwhile, over at the Berman Center at Harvard, the I&D team have launched an interactive version of the Iranian blogosphere map documented in a paper released early last year. Haven’t had much time to test it out yet, but given the other international projects happening over that way at the moment and the linkfluence/rtgi-type projects, this kind of interactive, rather than static, output may become more common in blog and internet network analysis and mapping.

4. Speaking of maps and internet networks, there’s been a bit of coverage of the new map of social (network) dominance over at techcrunch. Obviously, the general dominance, in western countries at least, of facebook over allcomers is a major talking point, but it’s also worth comparing the map to that produced two years ago. Again, facebook’s spread is particularly evident, but whereas in 2007 myspace still had a majority, of whatever margin, of dominance in such countries as Australia, the US, Italy, and Greece, facebook usurping it in all four of those countries, as well as taking over most of western Europe and claiming a large chunk of Africa, leaves myspace’s sole outpost in 2009 as… Guam? The move of facebook into many languages has also meant that the previously language-specific clusters – such as skyblog’s control of francophone nations – is eroded. There’s more to be taken from both maps, and I haven’t looked at any of the numbers involved here – both maps use data from Alexa, but as noted in the Techcrunch post there’s some debate as to whether myspace or facebook are the leading social network in the US. However, I’ll leave it on one final, pleasing point – that the 2009 map, being zoomable and able to select and customise views, has been produced using ManyEyes (mentioned here many times previously).

linkfluence visualise the French blogosphere (or bits of it) (twice!)

Previously mentioned on several occasions, linkfluence/rtgi, who are leading the way in not just visualising maps of online networks but also giving several levels of information and scalabity, have in the last week or so released two visualisations for different sites. Last year, of course, they produced PresidentialWatch08 for the US Presidential election, and in 2007 had Observatoire Presidentielle for the French equivalent. Now come two new maps, one blog-centric and the other providing a more topical view of website connections.


First is the Wikiopole, for Wikio (a search and ranking site, who have also been developing tools for researchers, including their Backlink Factory). Depicting the connections between the top 1500 ranked blogs, and with sites coded based on their category (political, science, sport, etc), the map provides another overview of the state of the French blogosphere, this time in May 2009 (and may be useful if a map comes out every month/several months – in which case, archiving each edition would be rather handy). It’s also good to see visualisations not just looking at the political side of things (not that’s necessarily a bad thing, but there are plenty of ways to subdivide networks of blogs). Plus, as an overall blogosphere study, there’s scope to compare the statistical layout of the linkfluence map to the personal work from ouinon.net in 2007, despite the long period between the production of the particular maps.


The second map is for touteleurope.fr, looking at 2046 sites (not just blogs) discussing Europe(an politics) in French. There’s quite a bit of cross-over, understandably, between this map and the Observatoire Presidentielle, although it’s less concerned with the different political ideologies present and the types of site and separating the analysts from the ‘militants’, for example.

I’m on a rather slow internet connection at the moment (and unfortunately the two maps take a while to load for me), and still waiting for some information before looking further at the two maps – a lengthier write-up will come, but for the moment any new work in the French blogosphere, political or not, and in network studies and visualisations (even with reservations about methods or outputs, as the case may be) is welcome.

let me see you

A while ago, Sky asked me for suggestions for mapping/visualisation tools for one of her chapters, and she’s since been testing out IssueCrawler, which she discusses here. While writing a quick list of possible tools, I came across a couple of new visualisation programs that I hadn’t tested, so this morning is all about seeing what the various software and online tools can do.

For today’s experiments, I’m not using either IssueCrawler or ManyEyes – I’ve discussed both previously, anyway, and IssueCrawler is not actually useful in this context – I’ll try a second entry about crawls, scraping, and visualisation (the likes of IssueCrawler and VOSON) later this week, hopefully. For this post, though, I’m going to take data acquired by hand and put into a two-column spreadsheet in Excel (I know, I’m a terrible person for not doing it in Calc, but this will be relevant a little later). I’m using the spreadsheet I created manually from blogroll links of the Wikio.fr Top 100 French Political blogs in May 2008, rather than crawling the internet looking for connections. ManyEyes will be used as a reference, but as I’ve already visualised the data being used, I’m not going to redo that process today. I’m also not going to go through what the visualisations show from the data involved, but (however shallow this may be) I’m focussing more on the aesthetics, what the maps look like and how this can be customised, exported, and embedded.

For the purposes of comparison, here is the original ManyEyes visualisation of the blogroll links between blogs on the list of the top 100 French Political blogs (May 2008):

ManyEyes visualisation

To create the visualisation above from the data was straightforward, a simple select the relevant cells in the two columns, copy-paste, and let ManyEyes do the work. However, the customisation of the visualisation is an issue – the layout can be recomputed and the diagram embedded in other sites online, but any other changes are limited. So, in the interests of comparing tools, and the likelihood of working with other data types later on in my research, I looked for other resources.

NetMap visualisation test

There is an add-on for Excel (2007 only, though) called .NetMap, which allows users to generate network maps from their data (the standard Excel chart options don’t do this, and neither do those in Calc). After a bit of playing around with options and updates to get everything working, I generated the above visualisation. The display options are heavily customisable – from vertex colour and shape to edge colour and opacity – but, for some reason, as you can see in the screenshot, the vertex labels did not show up. This is fine when using .NetMap itself, as the diagram is next to the spreadsheet itself, and when you select a vertex, it shows the edges connecting it to other vertices and highlights the relevant cells in the spreadsheet. Beyond that context, though, such as when I use the screenshot elsewhere, there was important information missing (admittedly, my brief tests may have just overlooked some settings, as is possible with any of the programs discussed here). [Edit: Indeed, after a helpful email and a bit more playing around, I’ve managed to display the labels alongside the vertices. This is what you get from not thoroughly exploring all settings…] A more useful aspect of .NetMap is the ability to generate subgraph images; basically, each vertex’s individual map, ignoring all the vertices it is not connected to. However, as .NetMap only works at the moment with Excel 2007, and my computer is destined to take on a Linux flavour around Christmas time, .NetMap is not an ideal long-term option for my personal visualisation needs. Nevertheless, for my research it will still be useful, and it’ll still be running on my work computer.

Cytoscape visualisation test

The above visualisation was created using Cytoscape, which has so far worked ideally – again, I haven’t tested it thoroughly, but it also allows display customisation and a range of layout algorithms. Importantly, it also allows direct import of data from an Excel spreadsheet. In the program itself I haven’t quite worked out how to get more information displayed, but the resulting visualisation is very pleasing and clear. I will be using Cytoscape more often, I think.

UCInet (Netdraw) visualisation test

One of the reasons I chose to use the reduced blogroll list is the focussed nature of edges and vertices – the first spreadsheet, of nearly all blogroll links, has many vertices that are only connected to one blog, which created rather large, messy maps. In addition, it’s easy to compare these maps through their small sample size and the presence of the tiny ‘island’ of five blogs not connected to the main network. After the Cytoscape test, I moved onto the ‘big two’ programs for social network analysis, UCINET (Netdraw) and Pajek. These two programs will be used for larger-scale analysis, using data from the crawling and scraping processes, for which the data will be in different formats. Excel spreadsheets, of course, are not preferred formats for either of these programs, so a bit of conversion had to take place. Luckily, this was not as problematic as trying to get an xml file from an Excel 2007 spreadsheet. Indeed, UCINET itself allows data to be imported from spreadsheets and saved as a matrix that Netdraw will be able to read. The above map, then, is the resulting Netdraw visualisation, using the Spring embedding option in Graph-Theoretic layout. Again, there are options for customising display and layout, and plenty of analytical tools that I haven’t tested yet (going for the visualisation angle first). A bit of refreshing the layout was required, though, to not have the vertices of the island lying on top of each other, thus only having three, rather than five, blogs visible (of course, you can also manually alter the position of vertices).

Pajek visualisation test

From Netdraw, the data could be converted into a Pajek-friendly format, although there is the risk that the layout used by Netdraw can influence that created by Pajek. A bit of playing around and recomputing different layouts negated that, though. Pajek also has the ability to draw the network in 3D, which is a nice option especially when dealing with the implied-three dimensions of the ‘blogosphere’. Similar customisation options to the other programs, although from an aesthetic perspective there’s something rather pleasing about the thin lines and stark colours of the small version of the map. Again, as with UCINET, I’m more likely to use Pajek for larger-scale projects than small maps like this, which I’d probably use a quicker option to go from a spreadsheet to (such as Cytoscape or ManyEyes), but the 3D aspect is handy (especially once I master the export options).

Mage (Netdraw) visualisation test screenshot

Finally, an accidental visualisation. I was testing some of UCINET’s export settings, and ended up somehow revisualising the network in Mage – which, like Pajek can be, uses a 3D layout. I have hardly gone through the options with this visualisation, but after generating all those maps, I was rather taken with the easy ability to rotate the network, including various degrees of shading to further emphasise the position of vertices in the 3D layout. The screenshot doesn’t really do it justice, but again I still need to go through the export options.

All of the tools I tested generated usable maps, with various degrees of customatisation. All except ManyEyes work offline, and all except UCINET (which has a free trial version) are freely available for download (however, .NetMap does require the rather not-free Excel 2007 for most of its stuff, although I think there is a standalone version too…). I imagine there are many other visualisation options available, too, although having more than five or so working options is possibly overkill. Nevertheless, the amount of data and the format used will dictate which visualisation program I use for my work. The ease of going from a basic two-column spreadsheet to the above maps is very pleasing, though, and even with my non-existant background in networks, informatics, stats, and other mathematical abilities, the ability to generate these will help my research.

[I also wanted to test out Gephi, but even after adding extensions to the version of Excel running on here, the xml file exported from Excel with the blog links has not yet been imported successfully by Gephi. Still, it’s another program that I will keep an eye on and try to get working later.]

