6 Introduction

In this chapter, we’re going to go through some typical data science tasks and learn how to do them in R. Later on, we’ll look at how you should apply these skills in your projects, and some tips for making your project easier to read and share with others.

For this chapter, we’re going to use a Kaggle dataset that holds information on video game sales. The dataset contains sales (in millions) in North America, Europe, Japan and globally for the top games each year going back to 1985. The games are broken down by platform, genre, and publisher:

## # A tibble: 16,598 x 11
##    Rank Name  Platform Year  Genre Publisher NA_Sales EU_Sales JP_Sales
##   <dbl> <chr> <chr>    <chr> <chr> <chr>        <dbl>    <dbl>    <dbl>
## 1     1 Wii … Wii      2006  Spor… Nintendo      41.5    29.0      3.77
## 2     2 Supe… NES      1985  Plat… Nintendo      29.1     3.58     6.81
## 3     3 Mari… Wii      2008  Raci… Nintendo      15.8    12.9      3.79
## 4     4 Wii … Wii      2009  Spor… Nintendo      15.8    11.0      3.28
## 5     5 Poke… GB       1996  Role… Nintendo      11.3     8.89    10.2 
## # … with 16,593 more rows, and 2 more variables: Other_Sales <dbl>,
## #   Global_Sales <dbl>