Lecture 8
Duke University
STA 199 - Fall 2024
September 24, 2024
Prepare for today’s application exercise: ae-08-durham-climate-factors
Go to your ae
project in RStudio.
Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
Click Pull to get today’s application exercise file: ae-08-durham-climate-factors.qmd.
Wait till the you’re prompted to work on the application exercise during class before editing the file.
https://sta199-f24.github.io/course-syllabus.html#regrade-requests
Considered for errors in grade calculation or if a correct answer was mistakenly marked as incorrect
Not a mechanism for:
Due on Gradescope within a week after an assignment is returned
The entire assignment may be regraded, which could result in an adjustment in either direction
No regrade requests after the final exam has been administered
# A tibble: 61 × 3
Timestamp How many classes do you have o…¹ `What year are you?`
<chr> <chr> <chr>
1 9/23/24 19:57 2 Sophomore
2 9/23/24 19:58 3 First-year
3 9/23/24 20:06 2 Sophomore
4 9/23/24 20:09 2 Sophomore
5 9/23/24 21:48 0 Senior
6 9/24/24 9:44 2 First-year
7 9/24/24 10:15 2 Senior
8 9/24/24 10:50 2 Sophomore
9 9/24/24 10:54 3 First-year
10 9/24/24 11:08 2 Senior
# ℹ 51 more rows
# ℹ abbreviated name: ¹`How many classes do you have on Tuesdays?`
rename()
variablesTo make them easier to work with…
What type of variable is tue_classes
?
# A tibble: 61 × 3
Timestamp tue_classes year
<chr> <chr> <chr>
1 9/23/24 19:57 2 Sophomore
2 9/23/24 19:58 3 First-year
3 9/23/24 20:06 2 Sophomore
4 9/23/24 20:09 2 Sophomore
5 9/23/24 21:48 0 Senior
6 9/24/24 9:44 2 First-year
7 9/24/24 10:15 2 Senior
8 9/24/24 10:50 2 Sophomore
9 9/24/24 10:54 3 First-year
10 9/24/24 11:08 2 Senior
# ℹ 51 more rows
Vectors can be constructed using the c()
function.
with intention…
with intention…
without intention…
R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that’s not always a great thing!
without intention…
Explicit coercion:
When you call a function like as.logical()
, as.numeric()
, as.integer()
, as.double()
, or as.character()
.
Implicit coercion:
Happens when you use a vector in a specific context that expects a certain type of vector.
R uses factors to handle categorical variables, variables that have a fixed and known set of possible values
We can think of factors like character (level labels) and an integer (level numbers) glued together
We can think of dates like an integer (the number of days since the origin, 1 Jan 1970) and an integer (the origin) glued together
We can think of data frames like like vectors of equal length glued together
Lists are a generic vector container; vectors of any type can go in them
pull()
function, we extract a vector from the data frame# A tibble: 61 × 3
Timestamp tue_classes year
<chr> <chr> <chr>
1 9/23/24 19:57 2 Sophomore
2 9/23/24 19:58 3 First-year
3 9/23/24 20:06 2 Sophomore
4 9/23/24 20:09 2 Sophomore
5 9/23/24 21:48 0 Senior
6 9/24/24 9:44 2 First-year
7 9/24/24 10:15 2 Senior
8 9/24/24 10:50 2 Sophomore
9 9/24/24 10:54 3 First-year
10 9/24/24 11:08 2 Senior
# ℹ 51 more rows
Reordering levels by:
fct_relevel()
: hand
fct_infreq()
: frequency
fct_reorder()
: sorting along another variable
fct_rev()
: reversing
…
Changing level values by:
fct_lump()
: lumping uncommon levels together into “other”
fct_other()
: manually replacing some levels with “other”
…