Midterm review
Suggested answers
b, c, f, g -
The
blizzard_salary
dataset has 409 rows.The
percent_incr
variable is numerical and continuous.The
salary_type
variable is categorical.
Figure 1 - A shared x-axis makes it easier to compare summary statistics for the variable on the x-axis.
c - It’s a value higher than the median for hourly but lower than the mean for salaried.
b - There is more variability around the mean compared to the hourly distribution.
a, b, e - Pie charts and waffle charts are for visualizing distributions of categorical data only. Scatterplots are for visualizing the relationship between two numerical variables.
c -
mutate()
is used to create or modify a variable.a -
"Poor", "Successful", "High", "Top"
b - Option 2. The plot in Option 1 shows the number of employees with a given performance rating for each salary type while the plot in Option 2 gives the proportion of employees with a given performance rating for each salary type. In order to assess the relationship between these variables (e.g., how much more likely is a Top rating among Salaried vs. Hourly workers), we need the proportions, not the counts.
There may be some
NA
s in these two variables that are not visible in the plot.The proportions under Hourly would go in the Hourly bar, and those under Salaried would go in the Salaried bar.
c -
filter(salary_type != "Hourly" & performance_rating == "Poor")
- There are 5 observations for “not Hourly” “and” Poor.a -
arrange()
- The result is arranged in increasing order ofannual_salary
, which is the default forarrange()
.c, d, e, f.
Part 1: The following should be fixed:
There should be a
|
after#
beforelabel
There should be a
:
after label, not=
There shouldn’t be a space in the chunk label, it should be
plot-blizzard
There should be spaces after commas in the code
There should be spaces on both sides of
=
in the codeThere should be a space before
+
geom_boxplot()
should be on the next line and indentedThere should be a
+
at the end of thegeom_boxplot()
linelabs()
should be indented
Part 2: The warning is caused by
NA
in the data. It means that 39 observations wereNA
s and are not plotted/represented on the plot.Part 1:
- Render: Run all of the code and render all of the text in the document and produce an output.
- Commit: Take a snapshot of your changes in Git with an appropriate message.
- Push: Send your changes off to GitHub.
Part 2: c - Rendering or committing isn’t sufficient to send your changes to your GitHub repository, a push is needed. A pull is also not needed to view the changes in the browser.