Deep Learning MOOC

cropped-wheel-1000x150

As I have already mentioned, it’s not easy being me. In addition to my already formed nice and balanced ‘curriculum’ I have enrolled in yet another MOOC on Deep Learning, or DLMOOC. It begins in a week’s time, on 20 January and it ends on 21 March. It is another instance of a so-called mechanical MOOC, similar to Python MOOC, and also created by P2PU. This one is for educators. Well, as a person who has already launched two data-expeditions and is totally resolved to keep doing it in the future, I thought it might be a good idea to kind of learn a bit more about education in general. And this seems to be a very nice chance, because this MOOC has already collected more than 600 participants, that is educators from all over the world.

To be honest, I don’t think I’ll be able to be a very valuable contributor in terms of active participation, because I still have to work, learn pre-calculus and data analysis. And yes, we’ll have to launch our next expedition one day too (in spring I hope). But I’m sure I’ll still receive lots of valuable experience. I already have. I do like the communication system of DLMOOC with a G+ community as central platform. Although I’m not sure yet if it is appropriate for data-expeditions. It also has a flexible cooperation mechanism with an option to choose whether the participants want to work in ‘offline’ (friend-to-friend) groups or join into virtual groups. And it’s very interesting to see how it is going to develop and work. I will try to make notes on the way and share them here.

Advertisements

Second Data Expedition in Russian: Mission Accomplished

Not long ago, we completed the second Russian-language data expedition (DE2, ДЭ2) and here’s how it was.

The Russian-language version of this report can be found here.

Our first data expedition (DE1) was launched in July 2013. While organising DE2 we took into account the previous experience.

Brief overview

  • DE2 was launched on 9 December and finished on 23 December 2013.
  • The idea of a data expedition and its principles is based on the projects developed by P2PU and School of Data, which actually coined the term data expedition, as far as I know.
  • Therefore, DE2 was an open P2P-learning project-based initiative, available for everyone, free of charge and based on the idea of mutual help and cooperation.
  • The declared purpose of DE2 was to go through the whole cycle of data processing, with the key emphasis on exploring the structure of the data and patterns within a data set.
  • Unlike DE1, DE2 offered a pre-planned scenario with a sequence of four tasks and instructions aimed at the facilitation of the research process. The tasks of this sequence actually reflected the approach described in the Data Analysis course by Jeff Leek at Coursera.
  • The scenario was based on a certain data set, namely Online Video survey conducted by PSRAI Omnibus and provided by PewInternet project.
  • However, the participants were absolutely free to come up with their own alternative projects. So the scenario part was first and foremost a framework for those who have hard time elaborating their own research pathway.
  • By default, DE2 suggested using Google Spreadsheets and Google/Open Refine as working tools, but the participants were free to use any tools they preferred (which they did).
  • Its communication activity was mainly concentrated in a Google Group, which could be used both as a forum and a mailing list.
  • DE2 required no prerequisites in terms of data processing experience.
  • 20 people signed up for participation.
  • DE2 was organised by Irina Radchenko and myself as part of our larger informal learning Russian-language project Datadrivenjournalism.ru.

Results

Just like in the case of DE1, they were twofold:

Participants’ results:

  • a number of visualisations reflecting the associations and patterns within the data set;
  • some visualisations and spreadsheets reflecting the structure of the data;
  • a number of links to learning resources contributed by participants;
  • a published material (in Russian) based on the research conducted under an alternative (participant-initiated) project within DE2.

(The visualisations and links can be found at the Google Group forum)

Organisational results:

  • the messages at the Google Group forum;
  • two forms filled by participants (initial and final surveys)

DE2 participants used various tools, including:

Process

1. While DE1 can be considered a relative success in terms of participants involvement, the main challenge in DE2 was to keep that involvement strategy and to supplement it with a better structuring solution so that the participants feel more certain regarding what they should do, no matter whether they have any previous experience in working with data.

To this end, we prepared a number of relatively short introductory/reference texts that provided the details about both the project’s basic principles and the meaning of particular aspects of building a data driven story. We began posting these texts at the Group’s forum 5 days before the official start of DE2. These texts, apart from actual tasks included:

  • a brief intro into using GoogleGroups
  • DE2 scenario
  • an intro into a data expedition learning format
  • a description of a possible presentable output structure
  • types of data analysis and possible types of conclusions that can be made based on various analysis procedures
  • an invitation for the participants to introduce themselves
  • an intro to the data set we offered to work with

The four tasks were (briefly):

  1. explore the data description and the data set; find the meaning of the variable names; think about possible questions;
  2. provide a general description of the data set (how many observations, missing data etc.); start exploring possible associations between variables; think about more questions;
  3. continue exploring the associations; build exploratory charts; possibly do some statistical modeling if there are such skills;
  4. create expository charts; write a story; publish it (for those who didn’t have any platform of their own we created a special DE2 blog).

Apparently, some of these texts did a kind of facilitator’s job by simply initiating a space where a discussion could develop.

2. The introduction of the scenario seems to have worked, as most of the participants were trying to follow the tasks and discuss their findings or at least were toying with the provided data in their own way. On the other hand, there was an alternative research initiative, carried out by one of the participants on his own. Although he didn’t have a whole team working on the same project, he managed to receive some informational support and feedback at the forum.

The main objective of the ready-made scenario was twofold:

  • To make sure that those who prefer to work on their own, but have trouble building their own research still have something to do without having to necessarily get involved into the communication process;
  • To make sure that the participants with no experience have something to rely on, as mentioned above.

3. The choice of the data set for the scenario was a product of a compromise. We did realize that for the Russian-speaking audience data on Russia would be more interesting and probably easier to work with (we actually had to translate into Russian the data description provided at the data website to make sure everyone can understand it). But we also wanted the data set (a) to be rather clean to spare inexperienced participants spending much time on cleaning, as they only had two weeks at their disposal; and (b) to contain lots of variables reflecting different parameters of measurement in order to provide a lot of various opportunities to compare them. The data set we came up with in the end was satisfying in terms of the latter two requirements, but was based on the US survey.

4. However, the alternative project was exploring the Russian material. Namely, it was aimed at measuring the effectiveness of the Russian legislation regarding blocking websites that are regarded as harmful for children (those that are deemed to be promoting child porn, drug abuse, suicide, etc.). This law was passed in 2012 and was widely considered a rather inefficient one in terms of its declared objective, but very convenient as a censure tool for blocking undesirable web resources. This is actually a very interesting direction of research, which could well be continued within our upcoming projects.

5. As to the collaboration activity during the expedition, we can mainly judge about it by the forum messages. Although these messages do not reflect any activities outside the forum, so we cannot measure them, but the forum seems somewhat representative as it is. Here is the activity shape, which shows that the communication was not evenly distributed, but it covers the official DE2 period and actually goes beyond the official landmarks. The figures behind this chart include all forum messages, that is both participants’ and organisers’ messages.

enActivityDistribution

The red framework shows the official terms of DE2.

And this is a chart showing the activity dynamic by days of the week measured through all the forum activity period. The most active exchange apparently happens on the first working days and then gradually slides down to almost cease at the weekend.

enActivityWeek

Figures can be found here.

It seems that during the working week, people didn’t have much time to work on DE2, so the most part of work was done at the weekend and after that people shared their findings.

Participants

While in DE1 the participants could join the Google Group on their own, DE2 included one extra step. Those who wanted to participate had to first sign up by filling a registration form. This helped us to collect more information about the participants. Also we expected that this additional step could serve as a motivation filter. After sending the form, everyone was added to the DE2 Group. As a result, we had 20 filled forms, but only 14 participants added to the Group. We couldn’t add the rest, because the email addresses they provided were inactive.

Here’s a brief review of the whole number of those who registered based on the form data. Organisers’ data are not included.

enDE2Participants

Figures can be found here.

DE1 vs. DE2

It is interesting to compare results of DE2 to the results of DE1. Here is the proportional comparison:

enDE1vsDE2.

Figures can be found here.

In this chart, we can see that during DE1 more messages (124) were posted on the forum than during DE2 (107). Given that DE1 was only one week long and DE2 took a fortnight, this might look somewhat discouraging. On the other hand, we can also see that more people were involved into cooperation in DE2. This was measured by two parameters: the number of people who left at least one message on the forum (a self-introduction message in the most cases, if it was the only one) and the number of people who participated in experience/information exchange (normally expressed in a form of questions and answers, as well as sharing findings). In both cases DE2 shows better results (10 and 6 correspondingly) than DE1 (6 and 4), although in the former case fewer people had the access to the forum.

Though it might seem somewhat confusing, the likely explanation is that the timing for DE2 was extremely counterproductive and was actually our huge mistake. Although it was planned to finish a week before the New Year, still, people were outstandingly busy trying to meet their deadlines at work, preparing for exams or just having a lot of fuss due to the upcoming holidays. I think this was one of the most important reasons for those message-free days shown in the chart above, as well as the relatively low number of messages.

Meanwhile, the bigger number of people involved into the working and communication process shows that even though the participants were very busy they were still willing to proceed with the project. This makes me think that DE2 shows some progress compared to DE1, although the timing lesson should be taken into consideration in the future.

I must add that the DE1 report makes no difference between the organisers as participants in its measurements. In the case of DE2, I only counted the data regarding the participants leaving the two organisers aside (with the exception of the cases where it is specially discussed). So when comparing the data on the both expeditions in this report I also used only participants’ data for DE1. This explains the slight differences in figures between the two reports.

Conclusions

Success

  • More participants were involved in the working process.
  • The participants demonstrated friendly and careful attitude to each other (and the feedback provided by 5 of them pointed out the fact that they appreciated the communication and cooperation component).
  • All the respondents in the final survey expressed their intention to participate in following expeditions.
  • The activity, although not very regular, kept persistent through the whole expedition period.

Problems

  • Timing. This is the mistake that should never be repeated. What is the best time for an expedition has yet to be tested, but it is already obvious that it should not be scheduled in such busy months as December, May, June and probably November.
  • Promotion strategy. Almost half of the registered (8 out of 20) signed up during the first week of DE2, after the expedition had officially begun (in the beginning of the second week the registration form was shut down). That means that the promotion should be more timely and efficient.
  • Relevance. Although something could be learn even with the help of the provided data set, still next time we hope to come up with a more relevant one.
  • The lack of final results presented. There was only one participant who published the material based on his research during DE2. At least one story published is by all means great. But no one else came up with a story. Partially the reason for it might be that some participants were satisfied with what they had learnt and felt no need to publish a story. Another reason could be the lack of time. Still I think that a better format for results presentation might become a motivation for a bigger output.

Based on the results of DE2 and taking into account its lessons, we are going to proceed with organising Russian-language DEs. We might also consider launching alternative types of DEs, alongside with the regular ones, such as:

  • DEs for the participants with a particular level/kind of skills.
  • DEs with the emphasis on real research, rather than learning.
  • DEs aimed at mastering particular tools or working techniques.

All in all, DE2 was quite an experience and a great opportunity for getting in touch with wonderful people. I hope we will soon come up with more DE projects.

And happy New Year to everyone who somehow managed to make their way through this post up to this point.

First Data Expedition in Russian: Mission Complete

It’s been a while since I last posted here and there are actually two reasons for that. First, I’ve got a really heavy workload and it’s going to remain so for a while. What is most upsetting, I haven’t got enough time for doing the Python course, but I’m certainly going to make up for it as soon as possible. Second, we were busy organising and then participating in the first Russian-language experimental data-expedition, or data-MOOC. And this is the experience I want to share here as well, because it was extremely inspiring and rather instructive. Besides, it’s about p2p-learning, which is one of the subjects of this blog.

While writing this account I was using the model provided in the account of the School of Data/P2PU’s MOOC.

Now, some overview

  • This project was inspired by participating in Data-MOOC organised by School of Data and P2PU in April-May 2013. Also, I must say that the blog of the Python MOOC has been really helpful and instructive.
  • The project was based on p2p-learning principles and a mechanical MOOC model. For the sake of brevity and attractiveness, we used the term ‘data-expedition’ (экспедиция данных, дата-экспедиция) to describe it.
  • It was a week long: from July 22 to July 28.
  • Its declared objective was to learn how to look for datasets online. To focus the task, we suggested a topic, which was collecting data about universities all over the world. So, unlike the School of Data’s Data-MOOC, it wasn’t supposed to reproduce the complete data-processing cycle, but rather to perform its first stage.
  • The project was organised by Irina Radchenko and myself as part our larger informal project Datadrivenjournalism.ru. Within the Expedition, we acted both as the support team and participants.
  • The goal of this Expedition was twofold. First, we wanted to see if this format works in the local environment. Second, well, I personally wanted to learn more about how to search for data.
  • We announced the upcoming data-expedition ten days before the start and by the beginning 20 people submitted for participation. Which was actually more than we expected.
  • Participation was absolutely free and open and no special skills were required.
  • The participants’ main communication platform was a Google group set as a forum (with a possibility to turn on the mailing option).
  • Our main collaboration tool was Google Docs.
  • This expedition heavily relied on collaboration and p2p initiative. It had no prescribed plan or step-by-step tasks, apart from the initially formulated one. So the organisational messages were first and foremost aimed at facilitating people’s communication and introducing into the specific of the format.

Results

As expected, they are twofold.

Participants’ results:

  • 3 visualisations
  • 1 data-scraping tutorial for beginners
  • A collective Google Doc with a list of sources

Organisational results:

  • 2 surveys (preliminary and final)
  • The participants’ exchange documented on the Google group’s forum
  • The set of collective Google Docs

As to the participants’ results, here are some links:

But in this post, I’ll focus on some highlights of the process.

1. Speaking of the organisation, our main target was to help people get involved in cooperation and boost activity. To this end, we started introducing people into the format a few days before the expedition began. Judging by the previous experience, the lack of confidence and the uncertainty about where to start and what to do is one of the barriers to be overcome. In order to facilitate cooperation, we published consequently a number of organisational messages:

  • Introduction to the objective of the expedition (explaining why searching for data is an important skill)
  • Introduction to the format of ‘expedition’ (or a mechanical MOOC) with some tips on what to do and how to react
  • List of tools for online-collaboration (and invitation to contribute participants’ own ideas)
  • Invitation for the participants to introduce themselves (several possible key-points of introduction were suggested)

By the beginning of the expedition the participants knew each other’s names and how to address each other; they also knew each other’s area of expertise. Moreover, they started communicating before the expedition officially began.

2. Some figures:

  • 20 people joined the expedition group
  • 10 people filled the pre-face survey
  • 6 people actively communicated at the forum during the whole expedition
  • 7 people filled the final survey

3. During the whole period of the Expedition, including the unofficial preliminary/introductory part (which began on 17 July), 124 messages were sent via the forum. Of course there were instances of bilateral communication, but we couldn’t register them for obvious reasons. Here’s the distribution of the forum activity (with the peak in the middle of expedition).

activity_distribution

4. The atmosphere of the communication was friendly and relaxed. The participants actively discussed each other’s initiatives and provided encouraging feedback.

Here are some facts about the participants (based on the pre-face survey):

data_exp_charts_eng

Figures can be found here.

Conclusions

Success

  • People got interested in the project and willingly joined
  • The participants demonstrated friendly and careful attitude to each other
  • The core of the group (6 people) were active during the whole period of the expedition
  • All the respondents in the final survey expressed their intention to participate in following expeditions
  • Most of the respondents in the final survey expressed their intention to complete the projects they started in the course of expedition, but failed to finish due to the lack of time

Problems

  • Obviously, the output of the project wasn’t confined to the declared objective. On the one hand, it’s natural: people learn are free to learn what they want to learn. On the other hand, the absence of a more precise schedule made some participants feel uncomfortable. From which we conclude that a more concentrated approach is needed.
  • All the respondents in the final survey said they didn’t have enough time to compete what they wanted. At the same time, most of them admitted that the terms were adequate to the task.
  • Most of the respondents felt some discomfort because of the lack of a coordinator or instructor and also said they didn’t always understand what and how they should do.

In the final survey, we asked how we could make the process more efficient and here’s the summary of the ideas:

  • More precise schedule of our activity would be good (like breaking the whole expedition period into specific phases)
  • Coordinator is needed
  • Some instruments for encouraging shy and unconfident participants would be helpful
  • It might be better if the output of the whole project is formulated more precisely
  • The topic should probably be narrower
  • Longer expedition terms would make it easier for self-organisation

Prospects

We are totally going to continue our experiments with this format. In the future, we are going to try something like:

  • One-day intensive online expeditions with fixed roles and distributed responsibilities
  • Long term (several weeks’ long) expeditions with a coordinator (constant or elected for a certain term)
  • Workshop-expeditions: online massive projects lead by a volunteer instructor or mentor willing to share their skills
  • Expeditions based on the Data-MOOC scheme (with pre-planned tasks)

We are also going to develop a method to register participants’ achievements, even small ones, in order to encourage further efforts.

Also, we feel the need to create a way to proudly present major achievements. Here we should consider the experience of creating badge systems.

Well, that’s it for now. It was really cool! And there’s quite a bit of work ahead too!

The Russian version of this account can be found here.

Still alive

I think this is the busiest summer I’ve ever had in my life. I’m trying hard to follow my schedule, but not always successfully. Thanks to Python MOOC’s organisers who havekindly  included a week’s break in the middle of the sequence and now I hope to cover week 4 before the next bunch of tasks arrives. I’ll soon post some updates on my findings and experiences.

For now I’ll just save a couple of links here:

This is where MIT OCW hometasks (assignments) can be downloaded. I just keep losing this page. Now I seem to have fixed it.

2013-07-17 04_07_03-Edward Tufte_ Books - The Visual Display of Quantitative Information

And another link, which is not about Python, but I thought it might be interesting for some of my peers. It’s The Visual Display of Quantitative Information by Edward R. Tufte. The shortcoming is that the book is not free. Well, at least it is not supposed to be. Anyway, it was recommended by a person whose judgement I trust here.

Also (just boasting) we’re starting an experimental one week’s long data-MOOC (or data-expedition) in Russian in less than a week’s time. The subject will also be very narrow: we’ll only have to learn different ways of searching for data. I really wonder what it’ll turn to be like. What I know for sure is that it’s going to be a huge pile of various information in addition to Python and my job. And there’ll have to be some additional analytical work afterwards, because we’ll have to sum up our results and understand what we’ll have to improve in its future iterations. The question is how I’m going to find time for all this. But I’ll have to.

Python MOOC – Week 2

I’m going to sum up the experiences of the past week and share what I managed to find out.

First off, I really like the way the MOOC is organised. Especially the way it encourages team work and p2p-learning process. First the instruction was to sign up for OpenStudy, which is very good in terms of mutual help and revision. But there’s a problem there. You can ask questions there alright, but you can ask only one question at a time. That is, after you asked your question, it appears on the questions wire and everyone can see it and answer it. But if you want to ask another question, you’ll have to mark the current one as ‘closed’ and only then you’ll have an option to ask a new one. ‘Closed’ means that it is removed from the wire shown by default (to the list of closed questions) and if you haven’t received the answer so far, there’s a chance you’ll never have it because nobody will notice the question.

2013-06-30 20_32_53-OpenStudy

Ah yes, also OpenStudy is often down, so you sometimes simply can’t use it.

But there are great options outside. First is that the MOOC organisers divided all MOOCsters into teams and provided them with mailing list addresses, so some questions cans be asked and answered in small groups and you have no limitations here.

Finally, there’s one more learning space I discovered only yesterday and haven’t tried yet, but it looks great. I mean Groups at Codecademy (you have to sign in to see the page). Although I’ve been using Codecademy for quite a while now, I didn’t know about their existence. Of course I immediately joined Python for Beginners group. I hope it’ll be a great experience.

Now a couple of words about this week’s homework. This week was rather challenging for me, because I was struggling to understand how loops work, especially the for loop. One of the tasks was to write a code that calculates exponentials using a for loop. Thanks to my team mates who helped me figure out what the task was about in the first place  – that is that the task should be executed without using the in-built exponentiation (**) option.

Now, I had dealt with for loops at Codecademy and found them rather easy. This is what I basically imagined:

for i in range(1, 10, 2):

    print i

So it does what you tell it to with all the items in a range.

But in this case a possible resulting code I got after many efforts (and quite a bit of guesswork, I admit) looks like this:

base = input("Enter base: ")

exp = input("Enter exponent: ")

x = base

for n in range(1, exp):

    x *= base

print x

So after I wrote it, I still had a question: how are for n in range(1, exp): and x *= base connected if there are no obvious operations in which n (the items from the range) are mentioned? The answer is obviously that they don’t have to be mentioned. That is, the for loop in this case is used to show the computer how many times the operation must be repeated.

This is what I realised after reading this awesome article about loops in Python. And I also realised that there’s a great way to see what programme does by adding print statements that reflect the process step by step. Like so:

base = input("Enter base: ")

exp = input("Enter exponent: ")

x = base

for n in range(1, exp):

    x *= base

    print x # This shows what's going on in the process

print x

So for instance if we have base 5 and exp 4, the output will be:

25

125

625

625

Also one of my team mates kindly recommended me to read Learning Python by Mark Lutz (I found out on the way that there’s a whole site about it).

Finally, I played with PyScripter IDE and explored some code sharing options, which I’m going to describe soon.

Oh, by the way, if some peers want to have a look at my whole homework (with the exception of optional tasks I’ll get back to them a bit later), it’s here: https://gist.github.com/ansakoy

Making my presentation – part two

(Part one is here)

After I had my draft structure done I had to distribute the content along a 10-minutes’ timeline, because I had only 10 minutes for my presentation. Actually, I had 10-12 minutes, but I decided to leave 2 minutes as a reserve in case something goes wrong.

Now the process of distributing all I wanted to say properly was a real challenge. In order to make it easier, I first drafted a timeline in my notebook.

timeline

Then I typed the text word by word and tried to read it with a stopwatch. Although I managed to fit it into the time frame (at the third attempt), I realized that I have to make it even shorter and, what is most important, to get rid of syntactically complicated constructions. So I ended up having a text divided into parts according to my presentation slides with 3 to 8 sentences for each slide.

Ah yes, regarding slides. Well, I decided not to use the colour scheme I was testing in the previous post, because it seemed to me boring. Instead I took this one by plamenj, which looks like this:

2013-06-23 01_41_53-COLOURlovers.com - Gamebookers

And here’s my presentation at Slideshare (actually longer than it used to be, because I split the animated slides made in PowerPoint into separate slides, as Slideshare doesn’t read animation).

It is originally in Russian of course, but I translated it.

Also, thanks to Jakes from my wonderful Team 10, because he sent me very helpful materials regarding designing presentation. And thanks to Irina (from the same team among all) for also sharing materials and generally being extremely encouraging.

By the way, I found out that PowerPoint and SlideShare are badly compatible. So I had to do some extra editing to make it look similar to its initial ppt version. Maybe next time I’ll try playing with Google and Libre Office equivalents, as well as Presi.com. But hopfully, not too soon, because I want to get back to the wonderful Python MOOC, which has already begun. The 1st week’s tasks don’t look too challenging, but there’s a bunch of some new information that I want to digest before I receive the next portion of tasks.

Preparing the first presentation in my life

This is supposed to be a complaining post. But I’ll also try to make it somehow useful at least due to the links to helpful resources I find on the way. Now, to the point. As I have already mentioned (more than once, I think), I hate visual stuff. And presentations today are all based on slides, so I’ve got to not only think about the structure and opening and closing and hooks for the audience, but also about making some decent background for my presentation.

So, learning again.

A couple of words about the circumstances. I’ve got to prepare this presentation for a conference on social computing that takes place in Moscow this Friday (on 21 June) and my topic is data journalism. Although I’ve got, say, 3 days ahead, I’m very short of time, because during these days I’ll also have to work and learn etc. So my most immediate target is to make at least a draft presentation to have some back-up in case I’m overwhelmed by work during the week.

So, first thing I did, I went web-hunting to find some tips on what to do. And here’s what I’ve found instructive so far.

Now, in order to start the process, I decided to create some structure. And in order to do this, in turn, I first put down some information blocks in order to later arrange them more logically.

Here’s what I’ve got in the end:

data_journalism_copy_small

Feel free to see this monster full-size.

And I was actually testing this palette by GlueStudio (which I downloaded from ColourLovers I mentioned above).

2013-06-18 03_02_33-COLOURlovers.com - Terra_

OK, next I’ll have to fit all this into like 5 slides (I’ve got no more than 12 minutes for my presentation).