Midterm review

Suggested answers

  1. b, c, f, g -

    • The blizzard_salary dataset has 409 rows.

    • The percent_incr variable is numerical and continuous.

    • The salary_type variable is categorical.

  2. Figure 1 - A shared x-axis makes it easier to compare summary statistics for the variable on the x-axis.

  3. c - It’s a value higher than the median for hourly but lower than the mean for salaried.

  4. b - There is more variability around the mean compared to the hourly distribution.

  5. a, b, e - Pie charts and waffle charts are for visualizing distributions of categorical data only. Scatterplots are for visualizing the relationship between two numerical variables.

  6. c - mutate() is used to create or modify a variable.

  7. a - "Poor", "Successful", "High", "Top"

  8. b - Option 2. The plot in Option 1 shows the number of employees with a given performance rating for each salary type while the plot in Option 2 gives the proportion of employees with a given performance rating for each salary type. In order to assess the relationship between these variables (e.g., how much more likely is a Top rating among Salaried vs. Hourly workers), we need the proportions, not the counts.

  9. There may be some NAs in these two variables that are not visible in the plot.

  10. The proportions under Hourly would go in the Hourly bar, and those under Salaried would go in the Salaried bar.

  11. c - filter(salary_type != "Hourly" & performance_rating == "Poor") - There are 5 observations for “not Hourly” “and” Poor.

  12. a - arrange() - The result is arranged in increasing order of annual_salary, which is the default for arrange().

  13. c, d, e, f.

  14. Part 1: The following should be fixed:

    • There should be a | after # before label

    • There should be a : after label, not =

    • There shouldn’t be a space in the chunk label, it should be plot-blizzard

    • There should be spaces after commas in the code

    • There should be spaces on both sides of = in the code

    • There should be a space before +

    • geom_boxplot() should be on the next line and indented

    • There should be a + at the end of the geom_boxplot() line

    • labs() should be indented

    Part 2: The warning is caused by NA in the data. It means that 39 observations were NAs and are not plotted/represented on the plot.

  15. Part 1:

    1. Render: Run all of the code and render all of the text in the document and produce an output.
    2. Commit: Take a snapshot of your changes in Git with an appropriate message.
    3. Push: Send your changes off to GitHub.

    Part 2: c - Rendering or committing isn’t sufficient to send your changes to your GitHub repository, a push is needed. A pull is also not needed to view the changes in the browser.