6 Introduction
In this chapter, we’re going to go through some typical data science tasks and learn how to do them in R. Later on, we’ll look at how you should apply these skills in your projects, and some tips for making your project easier to read and share with others.
For this chapter, we’re going to use a Kaggle dataset that holds information on video game sales. The dataset contains sales (in millions) in North America, Europe, Japan and globally for the top games each year going back to 1985. The games are broken down by platform, genre, and publisher:
## # A tibble: 16,598 × 11
## Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1 Wii Sports Wii 2006 Sports Nintendo 41.5 29.0 3.77
## 2 2 Super Mario… NES 1985 Platfo… Nintendo 29.1 3.58 6.81
## 3 3 Mario Kart … Wii 2008 Racing Nintendo 15.8 12.9 3.79
## 4 4 Wii Sports … Wii 2009 Sports Nintendo 15.8 11.0 3.28
## 5 5 Pokemon Red… GB 1996 Role-P… Nintendo 11.3 8.89 10.2
## # … with 16,593 more rows, and 2 more variables: Other_Sales <dbl>,
## # Global_Sales <dbl>