Visualisation progress

Trends GoogleDone it! By a pure chance, but I seem to have done it! An interactive Google visualisation of my data, which shows the correlation between CO2 emissions volume and GDP growth. Could be better and more detailed, I know, but wow, I didn’t even realize Google is really capable of it or I’m really capable of squeezing it from Google.

Now, some details. First, due to a very complicated relationship between WordPress.com and embeddable stuff, I can’t publish it here. I can only provide a link to where this interactivity is available. So, here’s the original spreadsheet with both the data and chart. And here’s my attempt (successful this time) to embed the chart into blogspot. And it was really a happy coincidence that I got this result, because I didn’t know how to do it. What I was actually trying to do is to shape my data so that it can be processed in Tableau Public. And it wouldn’t work.

Then I realized that TP isn’t free software (only a 14 days’ trial version is free), which immediately made it rather unattractive im my eyes.

UPD: A commentator has kindly corrected me. Tableau has both free and paid versions (and the 14 day’s trial is for the latter). Tableau Public is free.

Today I tried to visualise this chart in Google Spreadsheets and here’s the result. So, our chief weapons are the tools used: Data Wrangler (free) and Google Spreadsheets (also free).

If somebody has any instructive tips or critisisms, I’ll be delighted to hear them.

Struggling with visualisation

I wasn’t going to post anything today, but now I see I’ll have to just for the sake of saving what I’ve learnt about data visualisation, which now seems to me the most challenging part of my beginner’s data manipulation. My target now is to make a story based on the CO2 emissions data. I have already played with two CO2 datasets and found out that some values are rather different. For instance, when I compared the top-10 CO2 emitters (in 2009, that is the latest year, for which CO2 emissions data is available) from two datasets (EIA and UN), I found not only certain differences, but also one obvious contradiction regarding South Africa. I’m not sure it’s really meaningful, but well, the lines obviously show contradictory trends for this particular country:

SA_chart

I have also noticed, by comparing IMF and WB data on GDP, that this kind of data is much more accurate than in the case of CO2. By accurate, I actually mean more similar. And more up-to-date, for that matter.

OK, that was the easiest part in fact. Next I’ve been trying to do some more visualisation using Tableau Public. With the help of visualisation, I want to find out whether there is any correlation between GDP growth and CO2 emissions volume; and I want to compare this correlation to that of GDP and CO2 per capita (which is strikingly different from CO2 emissions by country).

The key problem here is to format the spreadsheet correctly, so that it can be processed in Tableau Public. I haven’t done it yet and I’m not sure I’ll manage to tonight, so I just want to save a couple of links and tips for the future.

First, there’s a cool tool for data cleaning and shaping. It’s called Data Wrangler. You don’t have to download it, it works in your browser.

Second, Tableau Public website has a wonderful gallery of brilliant visualisations. They call it a source of inspiration. I’d rather call it a fascinating source of learning materials. You can download any visualisation you like and then extract the data from there and see how it’s shaped. And also, some authors tell how they did it. Among others, there’s a complicated interactive visualisation by Alex Kerin, which I downloaded as a sample and which I’m currently trying to analyse.

Data journalism: Learning insights

Today my learning is focused on data journalism (I’ve got to finish my story as a challenge within Data Expedition). And also, today I decided to have a look at the product rather than the technique, as I previously did. To this end, I went to read Guardian Datablog and it seems to be quite an enlightening experience.

But first off, I have to give credit to Kevin Graveman, whose post actually provoked me to think in this direction. Kevin gave some tips on learning CSS by looking at both HTML and CSS sources of a page and also comparing it to the way the page looks in order to better understand how it works.
Now, this approach (quite natural, but not always obvious) can be replicated in many other areas. So today, I’m applying it to The Guardian by learning the anatomy of their data driven materials (just as if I was looking at the source code of their product). And I’m also making notes about my observations on the way.

  1. They ALWAYS provide links to their datasets. Under each piece of visualisation, they post a link to a small particular spreadsheet with the data regarding this piece.
  2. After the article they also provide a link to the full spreadsheet.
  3. A spreadsheet contains not only data, but also notes (on a separate sheet) with sources and some explanations. Like so  (for this article).
  4. Guardian Datablog is a great source of datasets. Although somewhat random.
  5. But these datasets are not always very trustworthy.
  6. Their visualisations are normally interactive.
  7. Some entries to the blog are very short in terms of writing, but provide complicated visualisations. Others rely on text substentially.
  8. Most underlying datasets in the materials I’ve seen are organised as single Google spreadsheets with several sheets (or tabs) containing particular spreadsheets. A good example is a recent Simon Rogers and Julia Kollewe’s material. The dataset is here.
  9. It seems to be a good idea to place some charts on separate sheets. (In order to do this, l-click the chart anywhere to open the quick edit mode, then hit the small triangle in the top corner on the right and choose ‘move to own sheet’.)

move chart

Tableau Public: trying out

My first visualisation ever. Just tested a tool. It’s called Tableau Public and it’s free.

Could be better, but practice makes perfect as they usually say in these cases. TP is really cool. But I can’t embed it into this blog, because:

There is also a service called www.WordPress.com which lets you get started with a new and free WordPress-based blog, but it is less flexible than the WordPress you download and install yourself. Blogs hosted on WordPress.com do not take advantage of tools like Tableau that use JavaScript.

(TP FAQ)

OK. Here’s a screenshot preview.

Workbook  TEST

And here’s its interactive version.

And now I’ll go’ n’ kill myself.

Python: An Upcoming Mechanical MOOC

images

I’ve just had an astonishing experience. I was kind of looking for a pic for this post and I decided to be trivial and to simply use Python logo. It can’t be a problem to find it online, can it? Just type “python” in Google, switch to images and here you are. Oh wait. There are also snakes called pythons…

*Okay face*

I had totally forgotten about their existence.

I won’t post those pythons here, because I know some people are afraid of snakes and detest the way they look. Although I’d love to actually.

Now, what I was actually going to say is that a cool Python mechanical MOOC is just about to start. I’ve already subscribed. It’s beginning in June and, judging by the archive of its previous round, it lasts 8 weeks. What is special about this course, is that there are no instructors there whatsoever. But there are peers with whom you can discuss the learning problems, tasks and what not. And well, there’s also a great Q&A Forum at Codecademy. And many other forums and communities online.

By the way, the link to this MOOC was kindly sent to me by my awesome Data Expedition teammate. That’s what I call a p2p community.

Building a network

Social media are great, because they are omnipresent, fast, easy to handle, good for getting in touch with people, monitoring news and accumulating multiple sources of information. But I genuinely love blogs, exactly because they are slower and more fundamental. And I’m sure they’ve got a huge p2p collaboration and networking potential (alongside with other tools of course – e.g. Wiki or Google Docs). That’s why I liked the Webcraft 101 idea to create such a p2p blogging community. It can be built from scratch of course and this process can be facilitated by searching for people through specialised places like P2PU or even Getstudyroom. But I thought it could also be a good thing for staying in touch and collaboration with already existing peers.

Now, this is exactly my case. For more than a month know, I’ve been learning and working in a team within a Data Expedition. And it happened so that this teamwork has been actually the best thing in the entire process. School of data is a wonderful project and Data Expeditions are really an awesome idea, but also a very challenging one in terms of implementation. I can’t say everything is perfectly organised – it’s a pilot version after all. But I was lucky to get into a team that actually made up for all the organisational shortcomings. And also for the first time gave me a sense of a p2p learning process.

Team10

The Data Expedition is going to be over soon. But it doesn’t mean that the teamwork is going to finish, especially as some team members expressed their willingness to stay in touch and continue our cooperation. So blogging might become an important part of such long-term collaboration process. I hope it will. Anyway, why not try. I’ve already followed one of our team members’ new-born blog, which I will promote as soon asI find out that tis blog’s author doesn’t mind.

Well, that said, I must admit that this week I haven’t learnt as much technical stuff as I’d want to. But instead I’ve learnt quite a bit of new things about building online cooperation communities.

And I’ve also been trying to code every day – that is to spend at least 15 minutes by solving tasks at Codecademy. They say it is kind of important in order to learn something. I do hope it helps! OK, we’ll see. Anyway, it’s fun.

Getting good ideas from peers: Handwritten Python

IMAG0417A really nightmarish thing is learning too many subjects at a time. In my case, it normally results in learning nothing and quitting everything. Which is why having made the first step in the Webcraft course (which is creating a blog and finding some very interesting people) I have to overcome the urge to continue with HTML and get back to my unfinished stuff with Python and data.

But as I expected, following the P2PU people is very helpful, because it’s a source of great ideas. For instance, as I guess from several blog posts, one of the upcoming tasks in the course is writing an HTML code by hand in order to better understand the syntax. Well, at my beginner’s level in Python syntax is one of the worst problems. Unless I learn it, it always becomes some guess work (often rather successful, because tasks are still very simple). Not long ago I realized that surprisingly enough the idea of learning new words and phrases by heart in order to remember them seems to me absolutely obvious and normal when it comes to human languages (I’m currently learning Greek for instance), but it feels really infuriating when it comes to machine languages. While there’s actually not so much difference, only the machine languages must be much easier in terms of vocabulary and syntax.

Now, I really liked the idea of writing a code by hand in order to remember the syntax and I decided to try it with Python. To that end, I used a short simple code from a Codecademy task that I’ve already successfully done. It’s about dictionaries and lists. Well, I tried to reproduce a similar syntactically correct code on paper. It took me 4 attempts to complete it without any mistakes (I hope)! But now I feel a bit more comfortable with it.

Thanks for sharing good ideas, peers.

UPD: Oops. Just noticed one mistake. It should have been “list1” or something like that… not just “list”…