6 Introduction

In this chapter, we’re going to go through some typical data science tasks and learn how to do them in R. Later on, we’ll look at how you should apply these skills in your projects, and some tips for making your project easier to read and share with others.

For this chapter, we’re going to use a Kaggle dataset that holds information on video game sales. The dataset contains sales (in millions) in North America, Europe, Japan and globally for the top games each year going back to 1985. The games are broken down by platform, genre, and publisher:

## # A tibble: 16,598 × 11
##    Rank Name         Platform Year  Genre   Publisher NA_Sales EU_Sales JP_Sales
##   <dbl> <chr>        <chr>    <chr> <chr>   <chr>        <dbl>    <dbl>    <dbl>
## 1     1 Wii Sports   Wii      2006  Sports  Nintendo      41.5    29.0      3.77
## 2     2 Super Mario… NES      1985  Platfo… Nintendo      29.1     3.58     6.81
## 3     3 Mario Kart … Wii      2008  Racing  Nintendo      15.8    12.9      3.79
## 4     4 Wii Sports … Wii      2009  Sports  Nintendo      15.8    11.0      3.28
## 5     5 Pokemon Red… GB       1996  Role-P… Nintendo      11.3     8.89    10.2 
## # … with 16,593 more rows, and 2 more variables: Other_Sales <dbl>,
## #   Global_Sales <dbl>