A week ago, I completed Computing for Data Analysis by Prof. Roger Peng at Coursera. This course was described as an introduction to the R language. Well, this might have been somewhat confusing, because it was an introductory course indeed for those who were totally new to R. But not for those who were total newbies in programming in general, which wasn’t actually directly mentioned in the course description. Judging by numerous complains at the discussion forum within the course, some people really were having hard time trying to figure out where to start having no programming experience whatsoever.
On the other hand, even a very distant familiarity with programming basics in Python made things a bit more tolerable to me than they would have been had I never ever seen things like an IDE or a for-loop before. So for me the course was rather challenging and even frustrating at times, but to my huge surprise I was able to complete the assignments. This doesn’t mean of course that I have perfectly understood, digested and mastered all the material provided. But after the course I really feel much more confident in the R environment. What is even more important, the course helped me to map my skills, so now I know what I need to learn better, where and how I can look for help and which spots in my knowledge I can rely on. All in all, I’m glad I took this course. Thanks to Dr. Peng and his wonderful teaching assistants who made a huge lot of job trying to retell the course material so that even total newbies could keep up.
By the way, I think the course is still available as archive at Coursera. Its video lectures are also available at YouTube.
Also, I must admit, I
have developed Stockholm Syndrome began to like R.
And I’ve spent almost two notebooks on it, because I really feel more confident when I make notes on the way.
Now, as a follow-up, I played a bit with the dataset, which was used for our last assignment focused on regular expressions. We worked with the homicide data from Baltimore Sun site, which provides an interactive application to navigate these data, but doesn’t provide them in a downloadable format. So Dr. Peng simply copied them from the page source and pasted into a text file. Here it is.
For our assignment we had to write two functions. One had to count the number of victims given the cause of death. The other had to count the number of victims of a given age.
I wanted to find out if there are any preferred ways of murder given a gender. I also wanted to visualise my results. To this end, I first wrote a function that sorted victims by gender given a cause and returned the result as a data frame. Then I wrote another function that joined the output of the first one into a general data frame for all the causes presented in the dataset. I realize my code is not exactly neat and nice, but I’m glad that at least it works.
And well, I actually found out that the most common cause of violent death in Baltimore in the period from 2007 to 2012 was shooting; that out of 1245 observations in 1126 cases victims are male, so it looks like this:
Also, the only category in which female victims prevail is asphyxiation. So speaking about preferences in killing tools given gender, this chart might be more instructive.
Well, for more sophisticated data analysis I’ve yet to learn loads of Statistics. By the way, as to Statistics, I’m still taking Statistics One by Prof. Andrew Conway at Coursera. Although it seemed a bit boring at the beginning, now it’s getting more and more interesting.
All this stuff is supposed to be completed by January. I must say, now I feel a strongest urge to get down to something a bit more fundamental, like maths and computer science basics.