01 is powered by Vocal creators. You support Zixin Nie by reading, sharing and tipping stories... more

01 is powered by Vocal.
Vocal is a platform that provides storytelling tools and engaged communities for writers, musicians, filmmakers, podcasters, and other creators to get discovered and fund their creativity.

How does Vocal work?
Creators share their stories on Vocal’s communities. In return, creators earn money when they are tipped and when their stories are read.

How do I join Vocal?
Vocal welcomes creators of all shapes and sizes. Join for free and start creating.

To learn more about Vocal, visit our resources.

Show less

A Review and Retrospective on DataCamp’s Data Scientist in Track

My Experiences Using DataCamp and What I Was Able to Learn

DataCamp

As someone who is trying to upskill from a statistician into a data scientist, I have found the vast array of different courses and resources for learning to be daunting. Almost every university is now offering some kind of data science program (that usually costs several thousand dollars), and online there are innumerable classes, bootcamps, MOOCs, tutorials, demonstrations and projects. Picking the right one to start with is a very difficult task in and of itself!

For myself, I decided to start with a subscription service called DataCamp. They offer relatively short, interactive courses to guide you through what they are teaching, consisting of short videos and coding exercises. As someone who already has a background in statistics, who needs more experience working with computational tools, the service DataCamp offers seemed to be a very good fit.

DatacCmp has quite a large course library (208 courses as of writing this article), so picking out where to begin learning is another challenge in choices! Luckily, they make it a bit easier by gathering courses into career streams and skill streams, so you know what courses you should take and the order you should take them in. I started off with the “Data Scientist in R” career track, which consists of 23 courses. They expect that you will need 94 hours to finish this track. For people with some background already in statistics and data science, you can do it in less time.

Wikipedia Article on R Programming Language

The stream starts off with an introductory course called “Introduction to R.” For those who don’t know, R is a programming language used primarily by academics and statisticians, that has many purpose-built libraries for statistics. This course is the introduction to the basics, where they teach you what R looks like, how to write your first programs in R, variable assignment, and the data types you will encounter in R. If you have absolutely no experience programming in any language, this course will give you the basics of how high-level programming languages like R work. The skills gained here are relatively transferrable, as most other programming languages work the same way as well.

“Intermediate R” gives you further tools to work within R, teaching you how to write loops, conditional statements, and functions, and giving you tools that will speed up your programming. Again, many of the skills are transferrable since these are still basic methods that every programmer should know how to use. The tools that are specific to R that this course teaches is the “apply” family of functions, which has syntax that only applies to R, but in other languages like Python there are analogous functions.

From this point onwards we start using tools from the Tidyverse. These are a series of packages written by Hadley Wickham that are designed for data manipulation, exploratory data analysis, data visualization, and data cleaning. The course stream goes through most of the functionalities present within the Tidyverse and teaches what most of the packages do. It starts off with an introduction to all the packages in the course “Introduction to the Tidyverse,” then move on to specifics such as how to import data of various formats from various sources (Importing Data into R parts 1 and 2), cleaning data (Cleaning Data in R), writing functions, data manipulation, joining data, working with dates and times, and data visualization. The series of courses introduces key packages like dplyr, ggplot2, and lubridate, all of which are heavily used by data scientists to manipulate, explore, process, and visualize data.

Tidyverse Packages via Tidyverse

From this point onwards we start using tools from the Tidyverse. These are a series of packages written by Hadley Wickham that are designed for data manipulation, exploratory data analysis, data visualization, and data cleaning. The course stream goes through most of the functionalities present within the Tidyverse and teaches what most of the packages do. It starts off with an introduction to all the packages in the course “Introduction to the Tidyverse,” then move on to specifics such as how to import data of various formats from various sources (Importing Data into R parts 1 and 2), cleaning data (Cleaning Data in R), writing functions, data manipulation, joining data, working with dates and times, and data visualization. The series of courses introduces key packages like dplyr, ggplot2, and lubridate, all of which are heavily used by data scientists to manipulate, explore, process, and visualize data.

After getting down the basics of the data science tools in R, the stream moves on to teaching what data science is and what data scientists do. “Introduction to Data” gives an overview of where data comes from, what types of data there are, observational studies verses experiments, and sampling design. Exploratory data analysis teaches the methods and techniques for understanding data when you first receive it, focusing on creating summary statistics and visualizations. These two courses give the basic intuition behind how to understand data, as well as practice using some of the tools from previous courses to gain this intuition.

The course stream also includes two courses on using SQL. SQL is an important tool used for querying databases that is heavily used in industry. The programming syntax for SQL is not hard to learn, and experience using SQL is an asset for just about every job related to data. The two courses give you most of the tools necessary to start using SQL. You can also use SQL queries within R to call and manipulate data, which is covered in some of the other courses.

The final few courses cover important data science techniques useful for analyzing data. They start off simple with regression, covering how to create linear regression models and how to interpret the output. Then they move on to cluster analysis, teaching the k-means clustering method and the hierarchical decision tree clustering method. They then move on to more advanced machine learning methods in both supervised and unsupervised learning, teaching a plethora of methods such as K-nearest-neighbors, naïve bayes, logistic regression, classification trees, and principle component analysis. These are all commonly used methods within industry, and these two courses give an introduction to each of the methods, how they are meant to be used, and what analysis using these methods looks like.

The track ends with a course on using R markdown, which is a very useful tool within R that allows users to create publishable reports within the RStudio IDE.

The Certificate DataCamp Presents Upon Completion of the Track

The overall impression I had after going through this course stream is that it provides a good introduction to data science, exposing students to the tools that data scientists use and giving them some hands-on experience. The first half of the course is very through as they go through almost all of the commonly used functionalities of important packages for pre-processing data. These tools are probably the most important to master, since data scientists spend most of their time manipulating and wrangling data to suit their analyses. Knowledge and mastery of these skills, though not glamorous, is a necessary pre-requisite for doing any further analysis upon the data.

When we move on to the second half of the course, the methods presented become much more of a taster than any thorough overview or introduction. Four courses cover over half a dozen different methods, most of which you only get a single chapter to practice. To get more depth and more understanding of what the methods are, how to use them properly, and how to interpret the outcomes, you will need to do more in depth reading and research. Or alternatively, Datacamp offers more in depth courses on each of these methods as well within their course library, some of which has been organized into skill tracks for a measurable progression.

After whetting your appetite with this introductory track, I’m sure you will want to learn a lot more about data science and how to further use R. I would recommend moving on to some of the skill tracks, and then taking the individual courses afterwards in whatever interests you. There are over a hundred different courses using the R language and about as many in Python. In time I will be writing reviews and retrospectives on the other courses that I have taken. 

I hope that this article has been useful for you, and has piqued your interest in DataCamp as a learning service. If you are looking to break into data science, this is a good place to start. You can sign up for free on DataCamp and get access to a few introductory courses, and if you have a Microsoft account you can use Microsoft Azure DevOps to get a two month free trial. I was able to finish this career track within one month working daily, so two months should be enough time if you put your mind to it and work hard. You can also get a subscription to DataCamp for $29 per month, or for $300 for a year-long subscription. 

Now Reading
A Review and Retrospective on DataCamp’s Data Scientist in Track
Read Next
How Orbit's B-hyve Keeps Your Gardens & Lawn Perfect Even When You're Away