First completed course at Coursera

A week ago, I completed Computing for Data Analysis by Prof. Roger Peng at Coursera. This course was described as an introduction to the R language. Well, this might have been somewhat confusing, because it was an introductory course indeed for those who were totally new to R. But not for those who were total newbies in programming in general, which wasn’t actually directly mentioned in the course description. Judging by numerous complains at the discussion forum within the course, some people really were having hard time trying to figure out where to start having no programming experience whatsoever.

On the other hand, even a very distant familiarity with programming basics in Python made things a bit more tolerable to me than they would have been had I never ever seen things like an IDE or a for-loop before. So for me the course was rather challenging and even frustrating at times, but to my huge surprise I was able to complete the assignments. This doesn’t mean of course that I have perfectly understood, digested and mastered all the material provided. But after the course I really feel much more confident in the R environment. What is even more important, the course helped me to map my skills, so now I know what I need to learn better, where and how I can look for help and which spots in my knowledge I can rely on. All in all, I’m glad I took this course. Thanks to Dr. Peng and his wonderful teaching assistants who made a huge lot of job trying to retell the course material so that even total newbies could keep up.

By the way, I think the course is still available as archive at Coursera. Its video lectures are also available at YouTube.

Also, I must admit, I have developed Stockholm Syndrome began to like R.

And I’ve spent almost two notebooks on it, because I really feel more confident when I make notes on the way.

20131026_102208

Now, as a follow-up, I played a bit with the dataset, which was used for our last assignment focused on regular expressions. We worked with the homicide data from Baltimore Sun site, which provides an interactive application to navigate these data, but doesn’t provide them in a downloadable format. So Dr. Peng simply copied them from the page source and pasted into a text file. Here it is.

For our assignment we had to write two functions. One had to count the number of victims given the cause of death. The other had to count the number of victims of a given age.

I wanted to find out if there are any preferred ways of murder given a gender. I also wanted to visualise my results. To this end, I first wrote a function that sorted victims by gender given a cause and returned the result as a data frame. Then I wrote another function that joined the output of the first one into a general data frame for all the causes presented in the dataset. I realize my code is not exactly neat and nice, but I’m glad that at least it works.

And well, I actually found out that the most common cause of violent death in Baltimore in the period from 2007 to 2012 was shooting; that out of 1245 observations in 1126 cases victims are male, so it looks like this:

bar_chart_by_gender

Also, the only category in which female victims prevail is asphyxiation. So speaking about preferences in killing tools given gender, this chart might be more instructive.

stacked_barchart_by_cause

Well, for more sophisticated data analysis I’ve yet to learn loads of Statistics. By the way, as to Statistics, I’m still taking Statistics One by Prof. Andrew Conway at Coursera. Although it seemed a bit boring at the beginning, now it’s getting more and more interesting.

Also I have completed the Python course at Codecademy. And immediately started a course in JavaScript. Because I like Codecademy. And because I don’t have enough time right now to focus on learning API with Python there. Never mind that I’m currently doing Introduction to Interactive Programming in Python at Coursera. I promise, I’ll quit it, as soon as it becomes too challenging to be combined with Statistics and Data Analysis, which starts on October 28th.

All this stuff is supposed to be completed by January. I must say, now I feel a strongest urge to get down to something a bit more fundamental, like maths and computer science basics.

It’s not easy being me

Just to complain a bit. But also probably somebody will be able to make more use of it than me.

By the end of summer I had a perfectly minimalistic learning plan for the autumn: R and Statistics. Isn’t it sweet? Just that and nothing else at least till December. Well, and a tiny bit of Python (Codecademy) at the background.

And here are some of the courses that turned up out of the blue right as soon as I started implementing my Perfect Plan.

  • Learning from Data at edX, began on September, 30
  • Social Network Analysis at Coursera, begins on October, 7
  • An iteration of Introduction to Interactive Programming in Python at Coursera. A course I failed to finish last spring because I got enrolled in School of Data Mission. Begins on October, 7
  • The Future of Storytelling at some Iversity (a new US MOOC platform, as far as I understand). Not sure it’s worth watching, but might be worth having a look at. Begins on October, 25 UPD. Has begun. No, definitely not worth wasting time on. I don’t mean the course is bad – I don’t know. But not what I think I need now.
  • Data Analysis at Coursera, begins on October, 28

Not to mention the upcoming (not sure when exactly, but this autumn) new iteration of School of Data MOOC (Data Explorer Mission).

Feel like Horrid Henry.

Horrid Henry

UPD. A new iteration of Python Mechanical MOOC is starting on October 21. Bingo!