Second Data Expedition in Russian: Mission Accomplished

Not long ago, we completed the second Russian-language data expedition (DE2, ДЭ2) and here’s how it was.

The Russian-language version of this report can be found here.

Our first data expedition (DE1) was launched in July 2013. While organising DE2 we took into account the previous experience.

Brief overview

  • DE2 was launched on 9 December and finished on 23 December 2013.
  • The idea of a data expedition and its principles is based on the projects developed by P2PU and School of Data, which actually coined the term data expedition, as far as I know.
  • Therefore, DE2 was an open P2P-learning project-based initiative, available for everyone, free of charge and based on the idea of mutual help and cooperation.
  • The declared purpose of DE2 was to go through the whole cycle of data processing, with the key emphasis on exploring the structure of the data and patterns within a data set.
  • Unlike DE1, DE2 offered a pre-planned scenario with a sequence of four tasks and instructions aimed at the facilitation of the research process. The tasks of this sequence actually reflected the approach described in the Data Analysis course by Jeff Leek at Coursera.
  • The scenario was based on a certain data set, namely Online Video survey conducted by PSRAI Omnibus and provided by PewInternet project.
  • However, the participants were absolutely free to come up with their own alternative projects. So the scenario part was first and foremost a framework for those who have hard time elaborating their own research pathway.
  • By default, DE2 suggested using Google Spreadsheets and Google/Open Refine as working tools, but the participants were free to use any tools they preferred (which they did).
  • Its communication activity was mainly concentrated in a Google Group, which could be used both as a forum and a mailing list.
  • DE2 required no prerequisites in terms of data processing experience.
  • 20 people signed up for participation.
  • DE2 was organised by Irina Radchenko and myself as part of our larger informal learning Russian-language project Datadrivenjournalism.ru.

Results

Just like in the case of DE1, they were twofold:

Participants’ results:

  • a number of visualisations reflecting the associations and patterns within the data set;
  • some visualisations and spreadsheets reflecting the structure of the data;
  • a number of links to learning resources contributed by participants;
  • a published material (in Russian) based on the research conducted under an alternative (participant-initiated) project within DE2.

(The visualisations and links can be found at the Google Group forum)

Organisational results:

  • the messages at the Google Group forum;
  • two forms filled by participants (initial and final surveys)

DE2 participants used various tools, including:

Process

1. While DE1 can be considered a relative success in terms of participants involvement, the main challenge in DE2 was to keep that involvement strategy and to supplement it with a better structuring solution so that the participants feel more certain regarding what they should do, no matter whether they have any previous experience in working with data.

To this end, we prepared a number of relatively short introductory/reference texts that provided the details about both the project’s basic principles and the meaning of particular aspects of building a data driven story. We began posting these texts at the Group’s forum 5 days before the official start of DE2. These texts, apart from actual tasks included:

  • a brief intro into using GoogleGroups
  • DE2 scenario
  • an intro into a data expedition learning format
  • a description of a possible presentable output structure
  • types of data analysis and possible types of conclusions that can be made based on various analysis procedures
  • an invitation for the participants to introduce themselves
  • an intro to the data set we offered to work with

The four tasks were (briefly):

  1. explore the data description and the data set; find the meaning of the variable names; think about possible questions;
  2. provide a general description of the data set (how many observations, missing data etc.); start exploring possible associations between variables; think about more questions;
  3. continue exploring the associations; build exploratory charts; possibly do some statistical modeling if there are such skills;
  4. create expository charts; write a story; publish it (for those who didn’t have any platform of their own we created a special DE2 blog).

Apparently, some of these texts did a kind of facilitator’s job by simply initiating a space where a discussion could develop.

2. The introduction of the scenario seems to have worked, as most of the participants were trying to follow the tasks and discuss their findings or at least were toying with the provided data in their own way. On the other hand, there was an alternative research initiative, carried out by one of the participants on his own. Although he didn’t have a whole team working on the same project, he managed to receive some informational support and feedback at the forum.

The main objective of the ready-made scenario was twofold:

  • To make sure that those who prefer to work on their own, but have trouble building their own research still have something to do without having to necessarily get involved into the communication process;
  • To make sure that the participants with no experience have something to rely on, as mentioned above.

3. The choice of the data set for the scenario was a product of a compromise. We did realize that for the Russian-speaking audience data on Russia would be more interesting and probably easier to work with (we actually had to translate into Russian the data description provided at the data website to make sure everyone can understand it). But we also wanted the data set (a) to be rather clean to spare inexperienced participants spending much time on cleaning, as they only had two weeks at their disposal; and (b) to contain lots of variables reflecting different parameters of measurement in order to provide a lot of various opportunities to compare them. The data set we came up with in the end was satisfying in terms of the latter two requirements, but was based on the US survey.

4. However, the alternative project was exploring the Russian material. Namely, it was aimed at measuring the effectiveness of the Russian legislation regarding blocking websites that are regarded as harmful for children (those that are deemed to be promoting child porn, drug abuse, suicide, etc.). This law was passed in 2012 and was widely considered a rather inefficient one in terms of its declared objective, but very convenient as a censure tool for blocking undesirable web resources. This is actually a very interesting direction of research, which could well be continued within our upcoming projects.

5. As to the collaboration activity during the expedition, we can mainly judge about it by the forum messages. Although these messages do not reflect any activities outside the forum, so we cannot measure them, but the forum seems somewhat representative as it is. Here is the activity shape, which shows that the communication was not evenly distributed, but it covers the official DE2 period and actually goes beyond the official landmarks. The figures behind this chart include all forum messages, that is both participants’ and organisers’ messages.

enActivityDistribution

The red framework shows the official terms of DE2.

And this is a chart showing the activity dynamic by days of the week measured through all the forum activity period. The most active exchange apparently happens on the first working days and then gradually slides down to almost cease at the weekend.

enActivityWeek

Figures can be found here.

It seems that during the working week, people didn’t have much time to work on DE2, so the most part of work was done at the weekend and after that people shared their findings.

Participants

While in DE1 the participants could join the Google Group on their own, DE2 included one extra step. Those who wanted to participate had to first sign up by filling a registration form. This helped us to collect more information about the participants. Also we expected that this additional step could serve as a motivation filter. After sending the form, everyone was added to the DE2 Group. As a result, we had 20 filled forms, but only 14 participants added to the Group. We couldn’t add the rest, because the email addresses they provided were inactive.

Here’s a brief review of the whole number of those who registered based on the form data. Organisers’ data are not included.

enDE2Participants

Figures can be found here.

DE1 vs. DE2

It is interesting to compare results of DE2 to the results of DE1. Here is the proportional comparison:

enDE1vsDE2.

Figures can be found here.

In this chart, we can see that during DE1 more messages (124) were posted on the forum than during DE2 (107). Given that DE1 was only one week long and DE2 took a fortnight, this might look somewhat discouraging. On the other hand, we can also see that more people were involved into cooperation in DE2. This was measured by two parameters: the number of people who left at least one message on the forum (a self-introduction message in the most cases, if it was the only one) and the number of people who participated in experience/information exchange (normally expressed in a form of questions and answers, as well as sharing findings). In both cases DE2 shows better results (10 and 6 correspondingly) than DE1 (6 and 4), although in the former case fewer people had the access to the forum.

Though it might seem somewhat confusing, the likely explanation is that the timing for DE2 was extremely counterproductive and was actually our huge mistake. Although it was planned to finish a week before the New Year, still, people were outstandingly busy trying to meet their deadlines at work, preparing for exams or just having a lot of fuss due to the upcoming holidays. I think this was one of the most important reasons for those message-free days shown in the chart above, as well as the relatively low number of messages.

Meanwhile, the bigger number of people involved into the working and communication process shows that even though the participants were very busy they were still willing to proceed with the project. This makes me think that DE2 shows some progress compared to DE1, although the timing lesson should be taken into consideration in the future.

I must add that the DE1 report makes no difference between the organisers as participants in its measurements. In the case of DE2, I only counted the data regarding the participants leaving the two organisers aside (with the exception of the cases where it is specially discussed). So when comparing the data on the both expeditions in this report I also used only participants’ data for DE1. This explains the slight differences in figures between the two reports.

Conclusions

Success

  • More participants were involved in the working process.
  • The participants demonstrated friendly and careful attitude to each other (and the feedback provided by 5 of them pointed out the fact that they appreciated the communication and cooperation component).
  • All the respondents in the final survey expressed their intention to participate in following expeditions.
  • The activity, although not very regular, kept persistent through the whole expedition period.

Problems

  • Timing. This is the mistake that should never be repeated. What is the best time for an expedition has yet to be tested, but it is already obvious that it should not be scheduled in such busy months as December, May, June and probably November.
  • Promotion strategy. Almost half of the registered (8 out of 20) signed up during the first week of DE2, after the expedition had officially begun (in the beginning of the second week the registration form was shut down). That means that the promotion should be more timely and efficient.
  • Relevance. Although something could be learn even with the help of the provided data set, still next time we hope to come up with a more relevant one.
  • The lack of final results presented. There was only one participant who published the material based on his research during DE2. At least one story published is by all means great. But no one else came up with a story. Partially the reason for it might be that some participants were satisfied with what they had learnt and felt no need to publish a story. Another reason could be the lack of time. Still I think that a better format for results presentation might become a motivation for a bigger output.

Based on the results of DE2 and taking into account its lessons, we are going to proceed with organising Russian-language DEs. We might also consider launching alternative types of DEs, alongside with the regular ones, such as:

  • DEs for the participants with a particular level/kind of skills.
  • DEs with the emphasis on real research, rather than learning.
  • DEs aimed at mastering particular tools or working techniques.

All in all, DE2 was quite an experience and a great opportunity for getting in touch with wonderful people. I hope we will soon come up with more DE projects.

And happy New Year to everyone who somehow managed to make their way through this post up to this point.

Advertisements