1. I’m still alive.

2. I keep working as a journalist. Recently I’ve actually tried applying my newly acquired skills to my real job. Still much to work on, but at least I seem to have learnt at least something. In the first case I tried to work with some data on death penalty in the US; in the second case, I was visualising some aspects (namely, on kidnapping) of Global Terrorism Database. Both materials are in Russian of course. Moreover, the website does currently not allow for embedding interactive visualisations, so there are only screenshots, while the original interactive stuff is published on my Blogger account (but again, in Russian). Speaking of Global Terrorism Database, there’s a whole course at Coursera based on this project. Don’t know much about it, so I can’t recommend it, but I’ll definitely have a look, as soon as I can.

3. I keep tracking the developments in the activity of Open Data School in Moscow. It’s an interesting project both as an educational initiative and as part of promoting openness. More on it later, as well as on DLMOOC, by the way, which is fascinating (sadly, I’ve been virtually unable to participate full-scale).

4. Meanwhile, I’m trying to keep up with Linear Algebra: Foundation to Frontiers and Statistical Learning.

5. Right now I’m in the middle of running yet another Russian-language data expedition (DE3), which began on 20 February. This one is a bit different from DE1 and DE2. First, we this time we (Irina Radchenko and myself from worked in partnership with Aleksey Sidorenko from NGO “Теплица социальных технологий” (Teplitsa/Greenhouse of Social Technologies). It is also the first time that we have taken a socially meaningful subject, which is orphan diseases. DE3 is going to finish on March 5. Soon after, I’ll be able to tell more about it, as well as about its findings (we’re digging the data on the situation in Russia in the first place). By the way, it will also be great to have some kind of feedback from people from other countries who are aware of the local situation (Jakes?).

6. Last, but not least, I’m currently involved (unfortunately in quite a hybernating way at the moment) in developing an international project on national informational resources. It all started with Team 10, but it’s going to grow. More on it later.

Second Data Expedition in Russian: Mission Accomplished

Not long ago, we completed the second Russian-language data expedition (DE2, ДЭ2) and here’s how it was.

The Russian-language version of this report can be found here.

Our first data expedition (DE1) was launched in July 2013. While organising DE2 we took into account the previous experience.

Brief overview

  • DE2 was launched on 9 December and finished on 23 December 2013.
  • The idea of a data expedition and its principles is based on the projects developed by P2PU and School of Data, which actually coined the term data expedition, as far as I know.
  • Therefore, DE2 was an open P2P-learning project-based initiative, available for everyone, free of charge and based on the idea of mutual help and cooperation.
  • The declared purpose of DE2 was to go through the whole cycle of data processing, with the key emphasis on exploring the structure of the data and patterns within a data set.
  • Unlike DE1, DE2 offered a pre-planned scenario with a sequence of four tasks and instructions aimed at the facilitation of the research process. The tasks of this sequence actually reflected the approach described in the Data Analysis course by Jeff Leek at Coursera.
  • The scenario was based on a certain data set, namely Online Video survey conducted by PSRAI Omnibus and provided by PewInternet project.
  • However, the participants were absolutely free to come up with their own alternative projects. So the scenario part was first and foremost a framework for those who have hard time elaborating their own research pathway.
  • By default, DE2 suggested using Google Spreadsheets and Google/Open Refine as working tools, but the participants were free to use any tools they preferred (which they did).
  • Its communication activity was mainly concentrated in a Google Group, which could be used both as a forum and a mailing list.
  • DE2 required no prerequisites in terms of data processing experience.
  • 20 people signed up for participation.
  • DE2 was organised by Irina Radchenko and myself as part of our larger informal learning Russian-language project


Just like in the case of DE1, they were twofold:

Participants’ results:

  • a number of visualisations reflecting the associations and patterns within the data set;
  • some visualisations and spreadsheets reflecting the structure of the data;
  • a number of links to learning resources contributed by participants;
  • a published material (in Russian) based on the research conducted under an alternative (participant-initiated) project within DE2.

(The visualisations and links can be found at the Google Group forum)

Organisational results:

  • the messages at the Google Group forum;
  • two forms filled by participants (initial and final surveys)

DE2 participants used various tools, including:


1. While DE1 can be considered a relative success in terms of participants involvement, the main challenge in DE2 was to keep that involvement strategy and to supplement it with a better structuring solution so that the participants feel more certain regarding what they should do, no matter whether they have any previous experience in working with data.

To this end, we prepared a number of relatively short introductory/reference texts that provided the details about both the project’s basic principles and the meaning of particular aspects of building a data driven story. We began posting these texts at the Group’s forum 5 days before the official start of DE2. These texts, apart from actual tasks included:

  • a brief intro into using GoogleGroups
  • DE2 scenario
  • an intro into a data expedition learning format
  • a description of a possible presentable output structure
  • types of data analysis and possible types of conclusions that can be made based on various analysis procedures
  • an invitation for the participants to introduce themselves
  • an intro to the data set we offered to work with

The four tasks were (briefly):

  1. explore the data description and the data set; find the meaning of the variable names; think about possible questions;
  2. provide a general description of the data set (how many observations, missing data etc.); start exploring possible associations between variables; think about more questions;
  3. continue exploring the associations; build exploratory charts; possibly do some statistical modeling if there are such skills;
  4. create expository charts; write a story; publish it (for those who didn’t have any platform of their own we created a special DE2 blog).

Apparently, some of these texts did a kind of facilitator’s job by simply initiating a space where a discussion could develop.

2. The introduction of the scenario seems to have worked, as most of the participants were trying to follow the tasks and discuss their findings or at least were toying with the provided data in their own way. On the other hand, there was an alternative research initiative, carried out by one of the participants on his own. Although he didn’t have a whole team working on the same project, he managed to receive some informational support and feedback at the forum.

The main objective of the ready-made scenario was twofold:

  • To make sure that those who prefer to work on their own, but have trouble building their own research still have something to do without having to necessarily get involved into the communication process;
  • To make sure that the participants with no experience have something to rely on, as mentioned above.

3. The choice of the data set for the scenario was a product of a compromise. We did realize that for the Russian-speaking audience data on Russia would be more interesting and probably easier to work with (we actually had to translate into Russian the data description provided at the data website to make sure everyone can understand it). But we also wanted the data set (a) to be rather clean to spare inexperienced participants spending much time on cleaning, as they only had two weeks at their disposal; and (b) to contain lots of variables reflecting different parameters of measurement in order to provide a lot of various opportunities to compare them. The data set we came up with in the end was satisfying in terms of the latter two requirements, but was based on the US survey.

4. However, the alternative project was exploring the Russian material. Namely, it was aimed at measuring the effectiveness of the Russian legislation regarding blocking websites that are regarded as harmful for children (those that are deemed to be promoting child porn, drug abuse, suicide, etc.). This law was passed in 2012 and was widely considered a rather inefficient one in terms of its declared objective, but very convenient as a censure tool for blocking undesirable web resources. This is actually a very interesting direction of research, which could well be continued within our upcoming projects.

5. As to the collaboration activity during the expedition, we can mainly judge about it by the forum messages. Although these messages do not reflect any activities outside the forum, so we cannot measure them, but the forum seems somewhat representative as it is. Here is the activity shape, which shows that the communication was not evenly distributed, but it covers the official DE2 period and actually goes beyond the official landmarks. The figures behind this chart include all forum messages, that is both participants’ and organisers’ messages.


The red framework shows the official terms of DE2.

And this is a chart showing the activity dynamic by days of the week measured through all the forum activity period. The most active exchange apparently happens on the first working days and then gradually slides down to almost cease at the weekend.


Figures can be found here.

It seems that during the working week, people didn’t have much time to work on DE2, so the most part of work was done at the weekend and after that people shared their findings.


While in DE1 the participants could join the Google Group on their own, DE2 included one extra step. Those who wanted to participate had to first sign up by filling a registration form. This helped us to collect more information about the participants. Also we expected that this additional step could serve as a motivation filter. After sending the form, everyone was added to the DE2 Group. As a result, we had 20 filled forms, but only 14 participants added to the Group. We couldn’t add the rest, because the email addresses they provided were inactive.

Here’s a brief review of the whole number of those who registered based on the form data. Organisers’ data are not included.


Figures can be found here.

DE1 vs. DE2

It is interesting to compare results of DE2 to the results of DE1. Here is the proportional comparison:


Figures can be found here.

In this chart, we can see that during DE1 more messages (124) were posted on the forum than during DE2 (107). Given that DE1 was only one week long and DE2 took a fortnight, this might look somewhat discouraging. On the other hand, we can also see that more people were involved into cooperation in DE2. This was measured by two parameters: the number of people who left at least one message on the forum (a self-introduction message in the most cases, if it was the only one) and the number of people who participated in experience/information exchange (normally expressed in a form of questions and answers, as well as sharing findings). In both cases DE2 shows better results (10 and 6 correspondingly) than DE1 (6 and 4), although in the former case fewer people had the access to the forum.

Though it might seem somewhat confusing, the likely explanation is that the timing for DE2 was extremely counterproductive and was actually our huge mistake. Although it was planned to finish a week before the New Year, still, people were outstandingly busy trying to meet their deadlines at work, preparing for exams or just having a lot of fuss due to the upcoming holidays. I think this was one of the most important reasons for those message-free days shown in the chart above, as well as the relatively low number of messages.

Meanwhile, the bigger number of people involved into the working and communication process shows that even though the participants were very busy they were still willing to proceed with the project. This makes me think that DE2 shows some progress compared to DE1, although the timing lesson should be taken into consideration in the future.

I must add that the DE1 report makes no difference between the organisers as participants in its measurements. In the case of DE2, I only counted the data regarding the participants leaving the two organisers aside (with the exception of the cases where it is specially discussed). So when comparing the data on the both expeditions in this report I also used only participants’ data for DE1. This explains the slight differences in figures between the two reports.



  • More participants were involved in the working process.
  • The participants demonstrated friendly and careful attitude to each other (and the feedback provided by 5 of them pointed out the fact that they appreciated the communication and cooperation component).
  • All the respondents in the final survey expressed their intention to participate in following expeditions.
  • The activity, although not very regular, kept persistent through the whole expedition period.


  • Timing. This is the mistake that should never be repeated. What is the best time for an expedition has yet to be tested, but it is already obvious that it should not be scheduled in such busy months as December, May, June and probably November.
  • Promotion strategy. Almost half of the registered (8 out of 20) signed up during the first week of DE2, after the expedition had officially begun (in the beginning of the second week the registration form was shut down). That means that the promotion should be more timely and efficient.
  • Relevance. Although something could be learn even with the help of the provided data set, still next time we hope to come up with a more relevant one.
  • The lack of final results presented. There was only one participant who published the material based on his research during DE2. At least one story published is by all means great. But no one else came up with a story. Partially the reason for it might be that some participants were satisfied with what they had learnt and felt no need to publish a story. Another reason could be the lack of time. Still I think that a better format for results presentation might become a motivation for a bigger output.

Based on the results of DE2 and taking into account its lessons, we are going to proceed with organising Russian-language DEs. We might also consider launching alternative types of DEs, alongside with the regular ones, such as:

  • DEs for the participants with a particular level/kind of skills.
  • DEs with the emphasis on real research, rather than learning.
  • DEs aimed at mastering particular tools or working techniques.

All in all, DE2 was quite an experience and a great opportunity for getting in touch with wonderful people. I hope we will soon come up with more DE projects.

And happy New Year to everyone who somehow managed to make their way through this post up to this point.

Is it Christmas already?

It’s been quite an intensive period recently. First, I was having two parallel courses at Coursera – on data analysis and on statistics. Second, Irina Radchenko and I were preparing to launch a new Russian-language data expedition under our project and then we were actually coordinating it for two weeks (9 – 23 December). Third, I suddenly had a huge task at work with a really tough deadline, which actually ruined my plans a bit, but thankfully not all of them. So here’s a brief account of the resulting layout:

I had to drop the data analysis course after its sixth week. Due to that sudden workload I couldn’t afford doing the second assignment, which was somewhat upsetting. But on the other hand, I think I’ll be able to do it later either on my own or within the course iteration (I’m almost sure it’s going to be launched soon again). Anyway, I’m glad I’ve done at least something, because it turned out to be rather helpful, especially in terms of structuring things and my mind. And yes, the previous course Computing for Data Analysis (on R) was extremely helpful. (For those who might be interested: the next iteration of this course starts on 6 January 2014.)

On the other hand, I triumphantly completed Statistics One course and that’s really cool. There are contradictory reviews of this course online. Some of them claim that the course is inconsistent in terms of difficulty: sometimes too easy and even boring, sometimes too complicated. Well, after completeing it, I can’t say that I’ve digested all the material provided. But now I have a better vision of what statistics is like and how it approaches data. Also I can apply some techniques for data analysis with the help R, but I wouldn’t claim I completely understand the mechanisms underlying some of these operations. Next I’m actually going to focus on Open Intro Statistics, which is a great textbook, and revise the material in order to pack it into my head. To wrap up this segment, I’ll add that the material that had been provided within that course by the middle of the semester was enough to complete assignment one in Data Analysis course.

As to the data expedition, it was luckily completed yesterday. Its organisation was considerably different from the previous experience and demanded quite a bit of in-advance preparation, apart from participation as it is. Although I couldn’t participate in it myself as thoroughly as I would want to, I still have to admit that the result somewhat exceded my expectations. I’ll be writing about it in a greater detail after I analyse the the whole picture. For now I can say that the timing was horrible. So the lesson is: never launch learning projects right befor Christmas or the New Year. But nonetheless there are some very inspiring results and the participants were virtually great.

Also, here are some links as usual:

And merry Christmas everyone who celebrates it now!

First Data Expedition in Russian: Mission Complete

It’s been a while since I last posted here and there are actually two reasons for that. First, I’ve got a really heavy workload and it’s going to remain so for a while. What is most upsetting, I haven’t got enough time for doing the Python course, but I’m certainly going to make up for it as soon as possible. Second, we were busy organising and then participating in the first Russian-language experimental data-expedition, or data-MOOC. And this is the experience I want to share here as well, because it was extremely inspiring and rather instructive. Besides, it’s about p2p-learning, which is one of the subjects of this blog.

While writing this account I was using the model provided in the account of the School of Data/P2PU’s MOOC.

Now, some overview

  • This project was inspired by participating in Data-MOOC organised by School of Data and P2PU in April-May 2013. Also, I must say that the blog of the Python MOOC has been really helpful and instructive.
  • The project was based on p2p-learning principles and a mechanical MOOC model. For the sake of brevity and attractiveness, we used the term ‘data-expedition’ (экспедиция данных, дата-экспедиция) to describe it.
  • It was a week long: from July 22 to July 28.
  • Its declared objective was to learn how to look for datasets online. To focus the task, we suggested a topic, which was collecting data about universities all over the world. So, unlike the School of Data’s Data-MOOC, it wasn’t supposed to reproduce the complete data-processing cycle, but rather to perform its first stage.
  • The project was organised by Irina Radchenko and myself as part our larger informal project Within the Expedition, we acted both as the support team and participants.
  • The goal of this Expedition was twofold. First, we wanted to see if this format works in the local environment. Second, well, I personally wanted to learn more about how to search for data.
  • We announced the upcoming data-expedition ten days before the start and by the beginning 20 people submitted for participation. Which was actually more than we expected.
  • Participation was absolutely free and open and no special skills were required.
  • The participants’ main communication platform was a Google group set as a forum (with a possibility to turn on the mailing option).
  • Our main collaboration tool was Google Docs.
  • This expedition heavily relied on collaboration and p2p initiative. It had no prescribed plan or step-by-step tasks, apart from the initially formulated one. So the organisational messages were first and foremost aimed at facilitating people’s communication and introducing into the specific of the format.


As expected, they are twofold.

Participants’ results:

  • 3 visualisations
  • 1 data-scraping tutorial for beginners
  • A collective Google Doc with a list of sources

Organisational results:

  • 2 surveys (preliminary and final)
  • The participants’ exchange documented on the Google group’s forum
  • The set of collective Google Docs

As to the participants’ results, here are some links:

But in this post, I’ll focus on some highlights of the process.

1. Speaking of the organisation, our main target was to help people get involved in cooperation and boost activity. To this end, we started introducing people into the format a few days before the expedition began. Judging by the previous experience, the lack of confidence and the uncertainty about where to start and what to do is one of the barriers to be overcome. In order to facilitate cooperation, we published consequently a number of organisational messages:

  • Introduction to the objective of the expedition (explaining why searching for data is an important skill)
  • Introduction to the format of ‘expedition’ (or a mechanical MOOC) with some tips on what to do and how to react
  • List of tools for online-collaboration (and invitation to contribute participants’ own ideas)
  • Invitation for the participants to introduce themselves (several possible key-points of introduction were suggested)

By the beginning of the expedition the participants knew each other’s names and how to address each other; they also knew each other’s area of expertise. Moreover, they started communicating before the expedition officially began.

2. Some figures:

  • 20 people joined the expedition group
  • 10 people filled the pre-face survey
  • 6 people actively communicated at the forum during the whole expedition
  • 7 people filled the final survey

3. During the whole period of the Expedition, including the unofficial preliminary/introductory part (which began on 17 July), 124 messages were sent via the forum. Of course there were instances of bilateral communication, but we couldn’t register them for obvious reasons. Here’s the distribution of the forum activity (with the peak in the middle of expedition).


4. The atmosphere of the communication was friendly and relaxed. The participants actively discussed each other’s initiatives and provided encouraging feedback.

Here are some facts about the participants (based on the pre-face survey):


Figures can be found here.



  • People got interested in the project and willingly joined
  • The participants demonstrated friendly and careful attitude to each other
  • The core of the group (6 people) were active during the whole period of the expedition
  • All the respondents in the final survey expressed their intention to participate in following expeditions
  • Most of the respondents in the final survey expressed their intention to complete the projects they started in the course of expedition, but failed to finish due to the lack of time


  • Obviously, the output of the project wasn’t confined to the declared objective. On the one hand, it’s natural: people learn are free to learn what they want to learn. On the other hand, the absence of a more precise schedule made some participants feel uncomfortable. From which we conclude that a more concentrated approach is needed.
  • All the respondents in the final survey said they didn’t have enough time to compete what they wanted. At the same time, most of them admitted that the terms were adequate to the task.
  • Most of the respondents felt some discomfort because of the lack of a coordinator or instructor and also said they didn’t always understand what and how they should do.

In the final survey, we asked how we could make the process more efficient and here’s the summary of the ideas:

  • More precise schedule of our activity would be good (like breaking the whole expedition period into specific phases)
  • Coordinator is needed
  • Some instruments for encouraging shy and unconfident participants would be helpful
  • It might be better if the output of the whole project is formulated more precisely
  • The topic should probably be narrower
  • Longer expedition terms would make it easier for self-organisation


We are totally going to continue our experiments with this format. In the future, we are going to try something like:

  • One-day intensive online expeditions with fixed roles and distributed responsibilities
  • Long term (several weeks’ long) expeditions with a coordinator (constant or elected for a certain term)
  • Workshop-expeditions: online massive projects lead by a volunteer instructor or mentor willing to share their skills
  • Expeditions based on the Data-MOOC scheme (with pre-planned tasks)

We are also going to develop a method to register participants’ achievements, even small ones, in order to encourage further efforts.

Also, we feel the need to create a way to proudly present major achievements. Here we should consider the experience of creating badge systems.

Well, that’s it for now. It was really cool! And there’s quite a bit of work ahead too!

The Russian version of this account can be found here.

Data Expedition Recap

I can hardly believe it, but my assignment at School of Data seems to be completed. The last step was to produce some output, that is to tell the story. Now I think I should somehow summarize my experience.

Now, first off, what is Data Expedition at School of Data? It can be very flexible in terms of organisation. Here are the links to the general description and also to the Guide for Guides, which is revealing. In this post, I’ll be talking about this particular expedition. Also, a great account of it can be found on one of my team mates’ blog. So, this expedition was technically very similar to the principle of Python Mechanical MOOC. All the instructions were sent by a robot via our mailing list and then we had to collaborate with our team mates to find solutions.


(Image CC-By-SA J Brew on Flickr)

First of all, we were given a dataset on CO2 emissions by country and CO2 emissions per capita. Our task was to look at the data and try to think about what can be done about it. As a background, we were also given the Guardian article based on this very dataset so that we could have a look at a possible approach. Well, I can’t say I was able to do the task right away. Without any experience of working

with data or any tools to deal with it, I felt absolutely frustrated by the very look of a spreadsheet. And at that stage peers could hardly provide any considerable technical support, because we all were newbies.

2013-06-03 01_13_18-Untitled - Google Maps

Then we had tasks to clean and format the data in order to analyze certain angles. Here our cooperation began and became really helpful. Although nobody among us was an expert here, we were all looking for the solutions and shared our experience, even when it was little more than ‘I DON’T UNDERSTAND ANYTHING!!11!!1!’.

Our chief weapons were:

  • the members’ supportive and encouraging attitude to each other
  • our mailing list
  • Google Docs to record our progress
  • Google Spreadsheets to work with our data and share the results
  • Google Hangout for our weekly meet-ups (really helpful, to my mind)
  • Google Fusion Tables for visualisation (alongside with Google Spreadsheets)

And that is it actually. I’m not mentioning more individual choices, because I’m not sure I even know about them all.

Now some credits.

Irina, you’ve been a source of wonderful links that really broadened my understanding of what’s going on. And above all, you’re extremely encouraging.

Jakes, you’ve contributed a huge amount of effort to get the things going and I think it paid off. You have also always been very supportive, generous and helpful even beyond the immediate team agenda.

Ketty, you were the first among us who was brave enough to face the spreadsheet as it is and proved that it is actually possible to work with. I was really inspired by this and tried to follow suit. Same was in the case of Google Fusion Tables.

Randah, I wish you had had more time at your disposal to participate in the teamwork. And judging by your brief inputs, you would make a great team mate. You were also the person who coined the term dataphobia and in this way located the problem I resolved to overcome. I hope to get in touch with you again when you have more spare time.

Zoltan, you were also an upsettingly rare contributor, due to your heavy and unpredictable workload. But nevertheless, you managed to provide an example of a very cool approach to overcoming big problems just by mechanically splitting them into smaller and less scary pieces.

Vanessa Gennarelli and Lucy Chambers, thanks for organising this wonderful MOOC!

So, as a result, I

  • seem to have overcome my general dataphobia
  • learnt a number of basic techniques
  • got an idea of what p2p learning is (it’s a cool thing, really)
  • got to know great people and hope to keep collaborating with them in the future

Well, this is kind of more than I expected.

Next, I’m going to learn more about data processing, Python, P2P-learning and other awesome things.

My first data-driven story ever

As this WordPress blog doesn’t want to embed interactive visualisations, I’ll publish the full story at Blogspot. This is actually the final challenge of the Data Expedition at School of Data, in which I was lucky to participate. I had to present the results of my data experiments as a data-driven story.

Any instructive feedback, recommendations and criticisms are welcome, because it’s really hard to assess this stuff from my beginner’s position. Also, if you notice any mistakes, which, I’m sure, are numerous, please let me know.

So, below is actually the story. And here’s the full dataset behind the story.

There was an article by Simon Rogers and Lisa Evans on Guardian Datablog, which showed that if we compare the pure CO2 emissions data and the data on CO2 per capita emissions, we can see strikingly different results. The starting point of this analysis was that the “world where established economies have large – but declining – carbon emissions. While the new economic giants are growing rapidly” [in terms of CO2 emissions volume again]. But if we look at the CO2 per capita data, we can see that those rapidly growing economic giants have very modest results, compared to the USA, as well as some really small economies like Qatar or Bahrain.

I decided to have closer look at the data on pure CO2 emissions, CO2 emissions per capita, as well as GDP, in order to see if there are any patterns. Namely, if there is any relationship between GDP growth and CO2/CO2 per capita emissions volume. The general picture can be seen on the interactive visualisation at Blogspot or here. (Honestly, I don’t know why this Google chart prefers to speak Russian when published. Actually, the Russian phrase in the chart’s navigation means ‘same size’.) It is based on the data for the top-10 CO2 emitters combined with top-10 CO2 per capita emitters (only those though, for which WB data on GDP had some information) and actually the GDP data for the period from 2005 to 2009, which was the optimal range in terms of data availability. Plus South Africa for the reasons described below.

Now, is there any relationship between GDP growth (or decline) and the amount of CO2 emissions? Here are some observations.

During the period of 2005 – 2008, all of the presented economies were growing, after which there was a massive decline in the economic growth, quite predictably, because the global economic crisis began in 2008. And we can see a corresponding massive decline of the amounts of CO2 emissions. Generally speaking, by 2008, about 30% of the total of the 21 countries had CO2 emissions growth rate below 100%. After 2008, it was about 60% of the total that had CO2 emissions growth rate below 100%.

Can we really insist that it was only the global economic decline that provoked this decline in CO2 emissions, and not, for example, the results of some green policies? Well, our data doesn’t provide enough information to draw this conclusion. But there is a peculiar thing to mention though.

After 2008, there were actually some economies (again, of our sample list) that continued to grow, namely, China, India, Japan, Singapore, and South Africa. The corresponding CO2 emissions indicators, in terms of growth or declination, are rather different, as can be seen below.


And also, there are five economies that had a considerable GDP decline, but nonetheless a stable CO2 emissions growth.


Now, if we look at these ten countries together, we shall see that only in three cases (Japan, Singapore and South Africa) GDP growth is accompanied by CO2 emissions decline. While in the other cases, CO2 emissions keep increasing without any obvious connection to the GDP trends.


Last thing I would want to mention is a very general observation. Just for the sake of it, I compared my initial CO2 emissions dataset from U.S. Energy Information Administration (EIA)  with another one (Carbon Dioxide Information Analysis Center (CDIAC)).

Here are the total values of the two datasets:


And here’s the total world GDP, according to the data from the World Bank and IMF. These look much more similar (as well as up-to-date):


This basically goes in accord with the observation that governments are paying less attention to the information on CO2 concentration in the atmosphere.

Another observation is that although the total trends in the two CO2 datasets seem to be non-contradictory (even though different) in general, it doesn’t mean that there are no contradictions in some particular cases. For instance, if we look at the top-10 CO2 emitters in both EIA and CDIAC datasets as of 2009, we can see that in CDIAC dataset South Africa takes the tenth position, while in the EIA dataset South Africa is in the twelfth position. Which when visualised shows contradictory trends: according to CDIAC, the volume of CO2 emissions from South Africa increases, and according to EIA, it goes down.


Visualisation progress

Trends GoogleDone it! By a pure chance, but I seem to have done it! An interactive Google visualisation of my data, which shows the correlation between CO2 emissions volume and GDP growth. Could be better and more detailed, I know, but wow, I didn’t even realize Google is really capable of it or I’m really capable of squeezing it from Google.

Now, some details. First, due to a very complicated relationship between and embeddable stuff, I can’t publish it here. I can only provide a link to where this interactivity is available. So, here’s the original spreadsheet with both the data and chart. And here’s my attempt (successful this time) to embed the chart into blogspot. And it was really a happy coincidence that I got this result, because I didn’t know how to do it. What I was actually trying to do is to shape my data so that it can be processed in Tableau Public. And it wouldn’t work.

Then I realized that TP isn’t free software (only a 14 days’ trial version is free), which immediately made it rather unattractive im my eyes.

UPD: A commentator has kindly corrected me. Tableau has both free and paid versions (and the 14 day’s trial is for the latter). Tableau Public is free.

Today I tried to visualise this chart in Google Spreadsheets and here’s the result. So, our chief weapons are the tools used: Data Wrangler (free) and Google Spreadsheets (also free).

If somebody has any instructive tips or critisisms, I’ll be delighted to hear them.

Struggling with visualisation

I wasn’t going to post anything today, but now I see I’ll have to just for the sake of saving what I’ve learnt about data visualisation, which now seems to me the most challenging part of my beginner’s data manipulation. My target now is to make a story based on the CO2 emissions data. I have already played with two CO2 datasets and found out that some values are rather different. For instance, when I compared the top-10 CO2 emitters (in 2009, that is the latest year, for which CO2 emissions data is available) from two datasets (EIA and UN), I found not only certain differences, but also one obvious contradiction regarding South Africa. I’m not sure it’s really meaningful, but well, the lines obviously show contradictory trends for this particular country:


I have also noticed, by comparing IMF and WB data on GDP, that this kind of data is much more accurate than in the case of CO2. By accurate, I actually mean more similar. And more up-to-date, for that matter.

OK, that was the easiest part in fact. Next I’ve been trying to do some more visualisation using Tableau Public. With the help of visualisation, I want to find out whether there is any correlation between GDP growth and CO2 emissions volume; and I want to compare this correlation to that of GDP and CO2 per capita (which is strikingly different from CO2 emissions by country).

The key problem here is to format the spreadsheet correctly, so that it can be processed in Tableau Public. I haven’t done it yet and I’m not sure I’ll manage to tonight, so I just want to save a couple of links and tips for the future.

First, there’s a cool tool for data cleaning and shaping. It’s called Data Wrangler. You don’t have to download it, it works in your browser.

Second, Tableau Public website has a wonderful gallery of brilliant visualisations. They call it a source of inspiration. I’d rather call it a fascinating source of learning materials. You can download any visualisation you like and then extract the data from there and see how it’s shaped. And also, some authors tell how they did it. Among others, there’s a complicated interactive visualisation by Alex Kerin, which I downloaded as a sample and which I’m currently trying to analyse.

Data journalism: Learning insights

Today my learning is focused on data journalism (I’ve got to finish my story as a challenge within Data Expedition). And also, today I decided to have a look at the product rather than the technique, as I previously did. To this end, I went to read Guardian Datablog and it seems to be quite an enlightening experience.

But first off, I have to give credit to Kevin Graveman, whose post actually provoked me to think in this direction. Kevin gave some tips on learning CSS by looking at both HTML and CSS sources of a page and also comparing it to the way the page looks in order to better understand how it works.
Now, this approach (quite natural, but not always obvious) can be replicated in many other areas. So today, I’m applying it to The Guardian by learning the anatomy of their data driven materials (just as if I was looking at the source code of their product). And I’m also making notes about my observations on the way.

  1. They ALWAYS provide links to their datasets. Under each piece of visualisation, they post a link to a small particular spreadsheet with the data regarding this piece.
  2. After the article they also provide a link to the full spreadsheet.
  3. A spreadsheet contains not only data, but also notes (on a separate sheet) with sources and some explanations. Like so  (for this article).
  4. Guardian Datablog is a great source of datasets. Although somewhat random.
  5. But these datasets are not always very trustworthy.
  6. Their visualisations are normally interactive.
  7. Some entries to the blog are very short in terms of writing, but provide complicated visualisations. Others rely on text substentially.
  8. Most underlying datasets in the materials I’ve seen are organised as single Google spreadsheets with several sheets (or tabs) containing particular spreadsheets. A good example is a recent Simon Rogers and Julia Kollewe’s material. The dataset is here.
  9. It seems to be a good idea to place some charts on separate sheets. (In order to do this, l-click the chart anywhere to open the quick edit mode, then hit the small triangle in the top corner on the right and choose ‘move to own sheet’.)

move chart

Building a network

Social media are great, because they are omnipresent, fast, easy to handle, good for getting in touch with people, monitoring news and accumulating multiple sources of information. But I genuinely love blogs, exactly because they are slower and more fundamental. And I’m sure they’ve got a huge p2p collaboration and networking potential (alongside with other tools of course – e.g. Wiki or Google Docs). That’s why I liked the Webcraft 101 idea to create such a p2p blogging community. It can be built from scratch of course and this process can be facilitated by searching for people through specialised places like P2PU or even Getstudyroom. But I thought it could also be a good thing for staying in touch and collaboration with already existing peers.

Now, this is exactly my case. For more than a month know, I’ve been learning and working in a team within a Data Expedition. And it happened so that this teamwork has been actually the best thing in the entire process. School of data is a wonderful project and Data Expeditions are really an awesome idea, but also a very challenging one in terms of implementation. I can’t say everything is perfectly organised – it’s a pilot version after all. But I was lucky to get into a team that actually made up for all the organisational shortcomings. And also for the first time gave me a sense of a p2p learning process.


The Data Expedition is going to be over soon. But it doesn’t mean that the teamwork is going to finish, especially as some team members expressed their willingness to stay in touch and continue our cooperation. So blogging might become an important part of such long-term collaboration process. I hope it will. Anyway, why not try. I’ve already followed one of our team members’ new-born blog, which I will promote as soon asI find out that tis blog’s author doesn’t mind.

Well, that said, I must admit that this week I haven’t learnt as much technical stuff as I’d want to. But instead I’ve learnt quite a bit of new things about building online cooperation communities.

And I’ve also been trying to code every day – that is to spend at least 15 minutes by solving tasks at Codecademy. They say it is kind of important in order to learn something. I do hope it helps! OK, we’ll see. Anyway, it’s fun.