Answers
“Tidy” data
- If we also had the number of cancer cases in our example dataset, what would be the ‘tidy’ way of adding that data? Would it be A or B below? Why?
- B would be the ‘tidy’ way of representing the data. A does not represent our variable (the number of cases) as a single variable, and so can’t be tidy.
The pipe (%>%)
- What does
1 %>% substr("hello", ., .) return? Why?
- It returns
"h". The . represents the result of the expression on the left of the pipe, so the code is equivalent to substr("hello", 1, 1) which returns "h".
Quasiquotation
- What is the going to be the value in the new column in the A example below? Why?
- The value is going to be
11 (1 + 10). Because the tstval variable is evaluated in the context of the dataframe first, R will use the value of the tstval column. If the tstval column did not exist, then R would keep looking through the environments until it found a matching object, and then would use that object. In that case, the value of the column would be 21 (1 + 20) because it was use the tstval variable we defined in the global environment.
Modelling
- What’s the difference between
y ~ x*w and y ~ x:w?
y ~ x*w is expanded to the effect of x and w and their interaction (i.e. x + w + x:w). x:w just represents the interaction effect between x and w. In other words, it doesn’t include the simple effect of each one.