Briefly

1. I’m still alive.

2. I keep working as a journalist. Recently I’ve actually tried applying my newly acquired skills to my real job. Still much to work on, but at least I seem to have learnt at least something. In the first case I tried to work with some data on death penalty in the US; in the second case, I was visualising some aspects (namely, on kidnapping) of Global Terrorism Database. Both materials are in Russian of course. Moreover, the website does currently not allow for embedding interactive visualisations, so there are only screenshots, while the original interactive stuff is published on my Blogger account (but again, in Russian). Speaking of Global Terrorism Database, there’s a whole course at Coursera based on this project. Don’t know much about it, so I can’t recommend it, but I’ll definitely have a look, as soon as I can.

3. I keep tracking the developments in the activity of Open Data School in Moscow. It’s an interesting project both as an educational initiative and as part of promoting openness. More on it later, as well as on DLMOOC, by the way, which is fascinating (sadly, I’ve been virtually unable to participate full-scale).

4. Meanwhile, I’m trying to keep up with Linear Algebra: Foundation to Frontiers and Statistical Learning.

5. Right now I’m in the middle of running yet another Russian-language data expedition (DE3), which began on 20 February. This one is a bit different from DE1 and DE2. First, we this time we (Irina Radchenko and myself from Datadrivenjournalism.ru) worked in partnership with Aleksey Sidorenko from NGO “Теплица социальных технологий” (Teplitsa/Greenhouse of Social Technologies). It is also the first time that we have taken a socially meaningful subject, which is orphan diseases. DE3 is going to finish on March 5. Soon after, I’ll be able to tell more about it, as well as about its findings (we’re digging the data on the situation in Russia in the first place). By the way, it will also be great to have some kind of feedback from people from other countries who are aware of the local situation (Jakes?).

6. Last, but not least, I’m currently involved (unfortunately in quite a hybernating way at the moment) in developing an international project on national informational resources. It all started with Team 10, but it’s going to grow. More on it later.

Preparing the first presentation in my life

This is supposed to be a complaining post. But I’ll also try to make it somehow useful at least due to the links to helpful resources I find on the way. Now, to the point. As I have already mentioned (more than once, I think), I hate visual stuff. And presentations today are all based on slides, so I’ve got to not only think about the structure and opening and closing and hooks for the audience, but also about making some decent background for my presentation.

So, learning again.

A couple of words about the circumstances. I’ve got to prepare this presentation for a conference on social computing that takes place in Moscow this Friday (on 21 June) and my topic is data journalism. Although I’ve got, say, 3 days ahead, I’m very short of time, because during these days I’ll also have to work and learn etc. So my most immediate target is to make at least a draft presentation to have some back-up in case I’m overwhelmed by work during the week.

So, first thing I did, I went web-hunting to find some tips on what to do. And here’s what I’ve found instructive so far.

Now, in order to start the process, I decided to create some structure. And in order to do this, in turn, I first put down some information blocks in order to later arrange them more logically.

Here’s what I’ve got in the end:

data_journalism_copy_small

Feel free to see this monster full-size.

And I was actually testing this palette by GlueStudio (which I downloaded from ColourLovers I mentioned above).

2013-06-18 03_02_33-COLOURlovers.com - Terra_

OK, next I’ll have to fit all this into like 5 slides (I’ve got no more than 12 minutes for my presentation).

My first data-driven story ever

As this WordPress blog doesn’t want to embed interactive visualisations, I’ll publish the full story at Blogspot. This is actually the final challenge of the Data Expedition at School of Data, in which I was lucky to participate. I had to present the results of my data experiments as a data-driven story.

Any instructive feedback, recommendations and criticisms are welcome, because it’s really hard to assess this stuff from my beginner’s position. Also, if you notice any mistakes, which, I’m sure, are numerous, please let me know.

So, below is actually the story. And here’s the full dataset behind the story.

There was an article by Simon Rogers and Lisa Evans on Guardian Datablog, which showed that if we compare the pure CO2 emissions data and the data on CO2 per capita emissions, we can see strikingly different results. The starting point of this analysis was that the “world where established economies have large – but declining – carbon emissions. While the new economic giants are growing rapidly” [in terms of CO2 emissions volume again]. But if we look at the CO2 per capita data, we can see that those rapidly growing economic giants have very modest results, compared to the USA, as well as some really small economies like Qatar or Bahrain.

I decided to have closer look at the data on pure CO2 emissions, CO2 emissions per capita, as well as GDP, in order to see if there are any patterns. Namely, if there is any relationship between GDP growth and CO2/CO2 per capita emissions volume. The general picture can be seen on the interactive visualisation at Blogspot or here. (Honestly, I don’t know why this Google chart prefers to speak Russian when published. Actually, the Russian phrase in the chart’s navigation means ‘same size’.) It is based on the data for the top-10 CO2 emitters combined with top-10 CO2 per capita emitters (only those though, for which WB data on GDP had some information) and actually the GDP data for the period from 2005 to 2009, which was the optimal range in terms of data availability. Plus South Africa for the reasons described below.

Now, is there any relationship between GDP growth (or decline) and the amount of CO2 emissions? Here are some observations.

During the period of 2005 – 2008, all of the presented economies were growing, after which there was a massive decline in the economic growth, quite predictably, because the global economic crisis began in 2008. And we can see a corresponding massive decline of the amounts of CO2 emissions. Generally speaking, by 2008, about 30% of the total of the 21 countries had CO2 emissions growth rate below 100%. After 2008, it was about 60% of the total that had CO2 emissions growth rate below 100%.

Can we really insist that it was only the global economic decline that provoked this decline in CO2 emissions, and not, for example, the results of some green policies? Well, our data doesn’t provide enough information to draw this conclusion. But there is a peculiar thing to mention though.

After 2008, there were actually some economies (again, of our sample list) that continued to grow, namely, China, India, Japan, Singapore, and South Africa. The corresponding CO2 emissions indicators, in terms of growth or declination, are rather different, as can be seen below.

chart1

And also, there are five economies that had a considerable GDP decline, but nonetheless a stable CO2 emissions growth.

chart2

Now, if we look at these ten countries together, we shall see that only in three cases (Japan, Singapore and South Africa) GDP growth is accompanied by CO2 emissions decline. While in the other cases, CO2 emissions keep increasing without any obvious connection to the GDP trends.

***

Last thing I would want to mention is a very general observation. Just for the sake of it, I compared my initial CO2 emissions dataset from U.S. Energy Information Administration (EIA)  with another one (Carbon Dioxide Information Analysis Center (CDIAC)).

Here are the total values of the two datasets:

chart4

And here’s the total world GDP, according to the data from the World Bank and IMF. These look much more similar (as well as up-to-date):

chart5

This basically goes in accord with the observation that governments are paying less attention to the information on CO2 concentration in the atmosphere.

Another observation is that although the total trends in the two CO2 datasets seem to be non-contradictory (even though different) in general, it doesn’t mean that there are no contradictions in some particular cases. For instance, if we look at the top-10 CO2 emitters in both EIA and CDIAC datasets as of 2009, we can see that in CDIAC dataset South Africa takes the tenth position, while in the EIA dataset South Africa is in the twelfth position. Which when visualised shows contradictory trends: according to CDIAC, the volume of CO2 emissions from South Africa increases, and according to EIA, it goes down.

chart3

Data journalism: Learning insights

Today my learning is focused on data journalism (I’ve got to finish my story as a challenge within Data Expedition). And also, today I decided to have a look at the product rather than the technique, as I previously did. To this end, I went to read Guardian Datablog and it seems to be quite an enlightening experience.

But first off, I have to give credit to Kevin Graveman, whose post actually provoked me to think in this direction. Kevin gave some tips on learning CSS by looking at both HTML and CSS sources of a page and also comparing it to the way the page looks in order to better understand how it works.
Now, this approach (quite natural, but not always obvious) can be replicated in many other areas. So today, I’m applying it to The Guardian by learning the anatomy of their data driven materials (just as if I was looking at the source code of their product). And I’m also making notes about my observations on the way.

  1. They ALWAYS provide links to their datasets. Under each piece of visualisation, they post a link to a small particular spreadsheet with the data regarding this piece.
  2. After the article they also provide a link to the full spreadsheet.
  3. A spreadsheet contains not only data, but also notes (on a separate sheet) with sources and some explanations. Like so  (for this article).
  4. Guardian Datablog is a great source of datasets. Although somewhat random.
  5. But these datasets are not always very trustworthy.
  6. Their visualisations are normally interactive.
  7. Some entries to the blog are very short in terms of writing, but provide complicated visualisations. Others rely on text substentially.
  8. Most underlying datasets in the materials I’ve seen are organised as single Google spreadsheets with several sheets (or tabs) containing particular spreadsheets. A good example is a recent Simon Rogers and Julia Kollewe’s material. The dataset is here.
  9. It seems to be a good idea to place some charts on separate sheets. (In order to do this, l-click the chart anywhere to open the quick edit mode, then hit the small triangle in the top corner on the right and choose ‘move to own sheet’.)

move chart