…and then the world

“where nothing we’ve actually seen has been mapped or outlined…”

Archive for November 2008

phrases of the blogosphere

leave a comment »


Another visualisation of blog (and other media) data: MemeTracker provides an alternative to the likes of Blogpulse, tracking stories and events across the blogosphere and mainstream media online through the presence of key quotes and phrases. The resulting visualisation shows the popularity and also lifetime of a particular story – for example, the Obama quip “you can put lipstick on a pig”. Looking at quotes and phrases is a useful method – the political one-liner can pop up years after the story itself has been dealt with, haunting later politicians and administrations. Indeed, a thread over at Larvatus Prodeo has reminded me of John Howard’s “If I were running Al-Qaeda in Iraq I would put a circle around March 2008 and pray as many times as possible for a victory, not only for Obama but also for the Democrats” (reported, for example, on the 7.30 Report back in February 2007)…

MemeTracker also has a ranking of sites used in its data gathering, based on their response time to stories, whether they are ahead of the curve or not. The usual blog suspects, the likes of Huffington Post and Daily Kos, not to mention Drudge, are among the quickest at reporting stories containing the phrases being tracked, with the Huffington Post in particular featuring nearly three out of four tracked phrases. Australian news sites vary with their response rates. http://www.news.com.au is the quickest to report out of those I saw from a quick glance, two hours before the fairfax duo of the Sydney Morning Herald and the Age (and also news.com.au …), while theaustralian.news.com.au on average only covers stories with said phrases at their peak popularity… The news.com.au, SMH, Age coverage also feature over 50% of the phrases, compared to 34% for the Australian ABC and 30-50% for the majority of British news sites (news.bbc.co.uk, telegraph.co.uk, timesonline.co.uk) – although the Guardian, one of the earliest of the UK sites to report the phrases, gets up to 70% (possibly due to its blog integration and amount of online-specific content?). There are plenty of aspects of MemeTracker to still investigate – which sites are on the source list, which aren’t, particularly international blogs (as opposed to international news sites), as the phrases used are, understandably, US-centric, and whether the sites earlier to cover stories influence the coverage of the subsequent sites, but it’s another interesting approach to tracking political discussion online and visualising it.

[via information aesthetics]


Written by Tim

11 November, 2008 at 3:18 pm

exit polls

leave a comment »

It’s a grey, rainy morning here in Brisbane, but despite the usual rush-hour activity going on outside, people hurrying to work, cars and buses hurtling down the main road, attention is focused on what’s happening yesterday, on the other side of the world.

I’m set up for a long day in front of my laptop and the television, in one of the Creative Industries buildings here at QUT. I have CNN running at the moment, possibly switching between it, BBC World, and Sky News when their election coverage kicks off. It’s a big day, not just in the US – just seeing the number of people who have election-themed sentiments in their facebook statuses or twits, even if not American, gives some idea of just how wide the interest is in who will become the President-elect.

No liveblogging today – there will be plenty of that elsewhere on the internet, but if you’re looking for that kind of thing, the likes of Larvatus Prodeo will have comments and a whole lot of links to people covering the whole damn thing – for a start, there’s a big liveblog happening at Crikey. I’ve spent the last few months watching all seven seasons of the West Wing, without much of a gap between them, so I’m conscious of avoiding spoilers in any commentary I might provide here – also, a lot of my initial knowledge, or context, for American politics comes from there…

For your election map needs, information aesthetics has a list of tools ready for you. And, watching CNN, I suddenly realised I’m going to see the ‘real’ version of this in action.

First polls close in about 10 minutes. This may take a while…

Written by Tim

5 November, 2008 at 8:47 am

let me see you

with 2 comments

A while ago, Sky asked me for suggestions for mapping/visualisation tools for one of her chapters, and she’s since been testing out IssueCrawler, which she discusses here. While writing a quick list of possible tools, I came across a couple of new visualisation programs that I hadn’t tested, so this morning is all about seeing what the various software and online tools can do.

For today’s experiments, I’m not using either IssueCrawler or ManyEyes – I’ve discussed both previously, anyway, and IssueCrawler is not actually useful in this context – I’ll try a second entry about crawls, scraping, and visualisation (the likes of IssueCrawler and VOSON) later this week, hopefully. For this post, though, I’m going to take data acquired by hand and put into a two-column spreadsheet in Excel (I know, I’m a terrible person for not doing it in Calc, but this will be relevant a little later). I’m using the spreadsheet I created manually from blogroll links of the Wikio.fr Top 100 French Political blogs in May 2008, rather than crawling the internet looking for connections. ManyEyes will be used as a reference, but as I’ve already visualised the data being used, I’m not going to redo that process today. I’m also not going to go through what the visualisations show from the data involved, but (however shallow this may be) I’m focussing more on the aesthetics, what the maps look like and how this can be customised, exported, and embedded.

For the purposes of comparison, here is the original ManyEyes visualisation of the blogroll links between blogs on the list of the top 100 French Political blogs (May 2008):

ManyEyes visualisation

To create the visualisation above from the data was straightforward, a simple select the relevant cells in the two columns, copy-paste, and let ManyEyes do the work. However, the customisation of the visualisation is an issue – the layout can be recomputed and the diagram embedded in other sites online, but any other changes are limited. So, in the interests of comparing tools, and the likelihood of working with other data types later on in my research, I looked for other resources.

NetMap visualisation test

There is an add-on for Excel (2007 only, though) called .NetMap, which allows users to generate network maps from their data (the standard Excel chart options don’t do this, and neither do those in Calc). After a bit of playing around with options and updates to get everything working, I generated the above visualisation. The display options are heavily customisable – from vertex colour and shape to edge colour and opacity – but, for some reason, as you can see in the screenshot, the vertex labels did not show up. This is fine when using .NetMap itself, as the diagram is next to the spreadsheet itself, and when you select a vertex, it shows the edges connecting it to other vertices and highlights the relevant cells in the spreadsheet. Beyond that context, though, such as when I use the screenshot elsewhere, there was important information missing (admittedly, my brief tests may have just overlooked some settings, as is possible with any of the programs discussed here). [Edit: Indeed, after a helpful email and a bit more playing around, I’ve managed to display the labels alongside the vertices. This is what you get from not thoroughly exploring all settings…] A more useful aspect of .NetMap is the ability to generate subgraph images; basically, each vertex’s individual map, ignoring all the vertices it is not connected to. However, as .NetMap only works at the moment with Excel 2007, and my computer is destined to take on a Linux flavour around Christmas time, .NetMap is not an ideal long-term option for my personal visualisation needs. Nevertheless, for my research it will still be useful, and it’ll still be running on my work computer.

Cytoscape visualisation test

The above visualisation was created using Cytoscape, which has so far worked ideally – again, I haven’t tested it thoroughly, but it also allows display customisation and a range of layout algorithms. Importantly, it also allows direct import of data from an Excel spreadsheet. In the program itself I haven’t quite worked out how to get more information displayed, but the resulting visualisation is very pleasing and clear. I will be using Cytoscape more often, I think.

UCInet (Netdraw) visualisation test

One of the reasons I chose to use the reduced blogroll list is the focussed nature of edges and vertices – the first spreadsheet, of nearly all blogroll links, has many vertices that are only connected to one blog, which created rather large, messy maps. In addition, it’s easy to compare these maps through their small sample size and the presence of the tiny ‘island’ of five blogs not connected to the main network. After the Cytoscape test, I moved onto the ‘big two’ programs for social network analysis, UCINET (Netdraw) and Pajek. These two programs will be used for larger-scale analysis, using data from the crawling and scraping processes, for which the data will be in different formats. Excel spreadsheets, of course, are not preferred formats for either of these programs, so a bit of conversion had to take place. Luckily, this was not as problematic as trying to get an xml file from an Excel 2007 spreadsheet. Indeed, UCINET itself allows data to be imported from spreadsheets and saved as a matrix that Netdraw will be able to read. The above map, then, is the resulting Netdraw visualisation, using the Spring embedding option in Graph-Theoretic layout. Again, there are options for customising display and layout, and plenty of analytical tools that I haven’t tested yet (going for the visualisation angle first). A bit of refreshing the layout was required, though, to not have the vertices of the island lying on top of each other, thus only having three, rather than five, blogs visible (of course, you can also manually alter the position of vertices).

Pajek visualisation test

From Netdraw, the data could be converted into a Pajek-friendly format, although there is the risk that the layout used by Netdraw can influence that created by Pajek. A bit of playing around and recomputing different layouts negated that, though. Pajek also has the ability to draw the network in 3D, which is a nice option especially when dealing with the implied-three dimensions of the ‘blogosphere’. Similar customisation options to the other programs, although from an aesthetic perspective there’s something rather pleasing about the thin lines and stark colours of the small version of the map. Again, as with UCINET, I’m more likely to use Pajek for larger-scale projects than small maps like this, which I’d probably use a quicker option to go from a spreadsheet to (such as Cytoscape or ManyEyes), but the 3D aspect is handy (especially once I master the export options).

Mage (Netdraw) visualisation test screenshot

Finally, an accidental visualisation. I was testing some of UCINET’s export settings, and ended up somehow revisualising the network in Mage – which, like Pajek can be, uses a 3D layout. I have hardly gone through the options with this visualisation, but after generating all those maps, I was rather taken with the easy ability to rotate the network, including various degrees of shading to further emphasise the position of vertices in the 3D layout. The screenshot doesn’t really do it justice, but again I still need to go through the export options.

All of the tools I tested generated usable maps, with various degrees of customatisation. All except ManyEyes work offline, and all except UCINET (which has a free trial version) are freely available for download (however, .NetMap does require the rather not-free Excel 2007 for most of its stuff, although I think there is a standalone version too…). I imagine there are many other visualisation options available, too, although having more than five or so working options is possibly overkill. Nevertheless, the amount of data and the format used will dictate which visualisation program I use for my work. The ease of going from a basic two-column spreadsheet to the above maps is very pleasing, though, and even with my non-existant background in networks, informatics, stats, and other mathematical abilities, the ability to generate these will help my research.

[I also wanted to test out Gephi, but even after adding extensions to the version of Excel running on here, the xml file exported from Excel with the blog links has not yet been imported successfully by Gephi. Still, it’s another program that I will keep an eye on and try to get working later.]

Written by Tim

3 November, 2008 at 5:09 pm