Still alive

I think this is the busiest summer I’ve ever had in my life. I’m trying hard to follow my schedule, but not always successfully. Thanks to Python MOOC’s organisers who havekindly  included a week’s break in the middle of the sequence and now I hope to cover week 4 before the next bunch of tasks arrives. I’ll soon post some updates on my findings and experiences.

For now I’ll just save a couple of links here:

This is where MIT OCW hometasks (assignments) can be downloaded. I just keep losing this page. Now I seem to have fixed it.

2013-07-17 04_07_03-Edward Tufte_ Books - The Visual Display of Quantitative Information

And another link, which is not about Python, but I thought it might be interesting for some of my peers. It’s The Visual Display of Quantitative Information by Edward R. Tufte. The shortcoming is that the book is not free. Well, at least it is not supposed to be. Anyway, it was recommended by a person whose judgement I trust here.

Also (just boasting) we’re starting an experimental one week’s long data-MOOC (or data-expedition) in Russian in less than a week’s time. The subject will also be very narrow: we’ll only have to learn different ways of searching for data. I really wonder what it’ll turn to be like. What I know for sure is that it’s going to be a huge pile of various information in addition to Python and my job. And there’ll have to be some additional analytical work afterwards, because we’ll have to sum up our results and understand what we’ll have to improve in its future iterations. The question is how I’m going to find time for all this. But I’ll have to.

Preparing the first presentation in my life

This is supposed to be a complaining post. But I’ll also try to make it somehow useful at least due to the links to helpful resources I find on the way. Now, to the point. As I have already mentioned (more than once, I think), I hate visual stuff. And presentations today are all based on slides, so I’ve got to not only think about the structure and opening and closing and hooks for the audience, but also about making some decent background for my presentation.

So, learning again.

A couple of words about the circumstances. I’ve got to prepare this presentation for a conference on social computing that takes place in Moscow this Friday (on 21 June) and my topic is data journalism. Although I’ve got, say, 3 days ahead, I’m very short of time, because during these days I’ll also have to work and learn etc. So my most immediate target is to make at least a draft presentation to have some back-up in case I’m overwhelmed by work during the week.

So, first thing I did, I went web-hunting to find some tips on what to do. And here’s what I’ve found instructive so far.

Now, in order to start the process, I decided to create some structure. And in order to do this, in turn, I first put down some information blocks in order to later arrange them more logically.

Here’s what I’ve got in the end:


Feel free to see this monster full-size.

And I was actually testing this palette by GlueStudio (which I downloaded from ColourLovers I mentioned above).

2013-06-18 - Terra_

OK, next I’ll have to fit all this into like 5 slides (I’ve got no more than 12 minutes for my presentation).

My first data-driven story ever

As this WordPress blog doesn’t want to embed interactive visualisations, I’ll publish the full story at Blogspot. This is actually the final challenge of the Data Expedition at School of Data, in which I was lucky to participate. I had to present the results of my data experiments as a data-driven story.

Any instructive feedback, recommendations and criticisms are welcome, because it’s really hard to assess this stuff from my beginner’s position. Also, if you notice any mistakes, which, I’m sure, are numerous, please let me know.

So, below is actually the story. And here’s the full dataset behind the story.

There was an article by Simon Rogers and Lisa Evans on Guardian Datablog, which showed that if we compare the pure CO2 emissions data and the data on CO2 per capita emissions, we can see strikingly different results. The starting point of this analysis was that the “world where established economies have large – but declining – carbon emissions. While the new economic giants are growing rapidly” [in terms of CO2 emissions volume again]. But if we look at the CO2 per capita data, we can see that those rapidly growing economic giants have very modest results, compared to the USA, as well as some really small economies like Qatar or Bahrain.

I decided to have closer look at the data on pure CO2 emissions, CO2 emissions per capita, as well as GDP, in order to see if there are any patterns. Namely, if there is any relationship between GDP growth and CO2/CO2 per capita emissions volume. The general picture can be seen on the interactive visualisation at Blogspot or here. (Honestly, I don’t know why this Google chart prefers to speak Russian when published. Actually, the Russian phrase in the chart’s navigation means ‘same size’.) It is based on the data for the top-10 CO2 emitters combined with top-10 CO2 per capita emitters (only those though, for which WB data on GDP had some information) and actually the GDP data for the period from 2005 to 2009, which was the optimal range in terms of data availability. Plus South Africa for the reasons described below.

Now, is there any relationship between GDP growth (or decline) and the amount of CO2 emissions? Here are some observations.

During the period of 2005 – 2008, all of the presented economies were growing, after which there was a massive decline in the economic growth, quite predictably, because the global economic crisis began in 2008. And we can see a corresponding massive decline of the amounts of CO2 emissions. Generally speaking, by 2008, about 30% of the total of the 21 countries had CO2 emissions growth rate below 100%. After 2008, it was about 60% of the total that had CO2 emissions growth rate below 100%.

Can we really insist that it was only the global economic decline that provoked this decline in CO2 emissions, and not, for example, the results of some green policies? Well, our data doesn’t provide enough information to draw this conclusion. But there is a peculiar thing to mention though.

After 2008, there were actually some economies (again, of our sample list) that continued to grow, namely, China, India, Japan, Singapore, and South Africa. The corresponding CO2 emissions indicators, in terms of growth or declination, are rather different, as can be seen below.


And also, there are five economies that had a considerable GDP decline, but nonetheless a stable CO2 emissions growth.


Now, if we look at these ten countries together, we shall see that only in three cases (Japan, Singapore and South Africa) GDP growth is accompanied by CO2 emissions decline. While in the other cases, CO2 emissions keep increasing without any obvious connection to the GDP trends.


Last thing I would want to mention is a very general observation. Just for the sake of it, I compared my initial CO2 emissions dataset from U.S. Energy Information Administration (EIA)  with another one (Carbon Dioxide Information Analysis Center (CDIAC)).

Here are the total values of the two datasets:


And here’s the total world GDP, according to the data from the World Bank and IMF. These look much more similar (as well as up-to-date):


This basically goes in accord with the observation that governments are paying less attention to the information on CO2 concentration in the atmosphere.

Another observation is that although the total trends in the two CO2 datasets seem to be non-contradictory (even though different) in general, it doesn’t mean that there are no contradictions in some particular cases. For instance, if we look at the top-10 CO2 emitters in both EIA and CDIAC datasets as of 2009, we can see that in CDIAC dataset South Africa takes the tenth position, while in the EIA dataset South Africa is in the twelfth position. Which when visualised shows contradictory trends: according to CDIAC, the volume of CO2 emissions from South Africa increases, and according to EIA, it goes down.


Visualisation progress

Trends GoogleDone it! By a pure chance, but I seem to have done it! An interactive Google visualisation of my data, which shows the correlation between CO2 emissions volume and GDP growth. Could be better and more detailed, I know, but wow, I didn’t even realize Google is really capable of it or I’m really capable of squeezing it from Google.

Now, some details. First, due to a very complicated relationship between and embeddable stuff, I can’t publish it here. I can only provide a link to where this interactivity is available. So, here’s the original spreadsheet with both the data and chart. And here’s my attempt (successful this time) to embed the chart into blogspot. And it was really a happy coincidence that I got this result, because I didn’t know how to do it. What I was actually trying to do is to shape my data so that it can be processed in Tableau Public. And it wouldn’t work.

Then I realized that TP isn’t free software (only a 14 days’ trial version is free), which immediately made it rather unattractive im my eyes.

UPD: A commentator has kindly corrected me. Tableau has both free and paid versions (and the 14 day’s trial is for the latter). Tableau Public is free.

Today I tried to visualise this chart in Google Spreadsheets and here’s the result. So, our chief weapons are the tools used: Data Wrangler (free) and Google Spreadsheets (also free).

If somebody has any instructive tips or critisisms, I’ll be delighted to hear them.

Struggling with visualisation

I wasn’t going to post anything today, but now I see I’ll have to just for the sake of saving what I’ve learnt about data visualisation, which now seems to me the most challenging part of my beginner’s data manipulation. My target now is to make a story based on the CO2 emissions data. I have already played with two CO2 datasets and found out that some values are rather different. For instance, when I compared the top-10 CO2 emitters (in 2009, that is the latest year, for which CO2 emissions data is available) from two datasets (EIA and UN), I found not only certain differences, but also one obvious contradiction regarding South Africa. I’m not sure it’s really meaningful, but well, the lines obviously show contradictory trends for this particular country:


I have also noticed, by comparing IMF and WB data on GDP, that this kind of data is much more accurate than in the case of CO2. By accurate, I actually mean more similar. And more up-to-date, for that matter.

OK, that was the easiest part in fact. Next I’ve been trying to do some more visualisation using Tableau Public. With the help of visualisation, I want to find out whether there is any correlation between GDP growth and CO2 emissions volume; and I want to compare this correlation to that of GDP and CO2 per capita (which is strikingly different from CO2 emissions by country).

The key problem here is to format the spreadsheet correctly, so that it can be processed in Tableau Public. I haven’t done it yet and I’m not sure I’ll manage to tonight, so I just want to save a couple of links and tips for the future.

First, there’s a cool tool for data cleaning and shaping. It’s called Data Wrangler. You don’t have to download it, it works in your browser.

Second, Tableau Public website has a wonderful gallery of brilliant visualisations. They call it a source of inspiration. I’d rather call it a fascinating source of learning materials. You can download any visualisation you like and then extract the data from there and see how it’s shaped. And also, some authors tell how they did it. Among others, there’s a complicated interactive visualisation by Alex Kerin, which I downloaded as a sample and which I’m currently trying to analyse.

Tableau Public: trying out

My first visualisation ever. Just tested a tool. It’s called Tableau Public and it’s free.

Could be better, but practice makes perfect as they usually say in these cases. TP is really cool. But I can’t embed it into this blog, because:

There is also a service called which lets you get started with a new and free WordPress-based blog, but it is less flexible than the WordPress you download and install yourself. Blogs hosted on do not take advantage of tools like Tableau that use JavaScript.


OK. Here’s a screenshot preview.

Workbook  TEST

And here’s its interactive version.

And now I’ll go’ n’ kill myself.