AE 07: Sales taxes + data joining

Suggested answers

Application exercise
Important

These are suggested answers. This document should be used as a reference only; it’s not designed to be an exhaustive key.

Getting started

Packages

We’ll use the tidyverse package for this analysis.

Data

The data are available in the data folder.

sales_taxes <- read_csv("data/sales-taxes.csv")
Rows: 51 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): state
dbl (4): state_tax_rate, avg_local_tax_rate, combined_rate, max_local_tax_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
us_regions <- read_csv("data/us-regions.csv")
Rows: 50 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state_name, region

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

And let’s take a look at the data.

glimpse(sales_taxes)
Rows: 51
Columns: 5
$ state              <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Califo…
$ state_tax_rate     <dbl> 0.0400, 0.0000, 0.0560, 0.0650, 0.0725, 0.0290, 0.0…
$ avg_local_tax_rate <dbl> 0.0529, 0.0182, 0.0278, 0.0295, 0.0160, 0.0491, 0.0…
$ combined_rate      <dbl> 0.0929, 0.0182, 0.0838, 0.0945, 0.0885, 0.0781, 0.0…
$ max_local_tax_rate <dbl> 0.0750, 0.0785, 0.0530, 0.0613, 0.0475, 0.0830, 0.0…
glimpse(us_regions)
Rows: 50
Columns: 2
$ state_name <chr> "Maine", "New Hampshire", "Vermont", "Massachusetts", "Rhod…
$ region     <chr> "Northeast", "Northeast", "Northeast", "Northeast", "Northe…

Joining

Join the sales tax data with region data and save the joined data frame as a new data frame, not overwriting either data frame that goes into the join.

sales_taxes_regions <- left_join(
  sales_taxes, 
  us_regions, 
  by = join_by(state == state_name)
)

Calculate the average sales tax of states in each region. What is surprising in the output?

sales_taxes_regions |>
  group_by(region) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 5 × 2
  region    mean_state_tax
  <chr>              <dbl>
1 Midwest           0.0569
2 Northeast         0.0530
3 South             0.0523
4 West              0.0416
5 <NA>              0.06  

Identify the state with NA for region.

sales_taxes_regions |>
  filter(is.na(region)) |>
  select(state, region)
# A tibble: 1 × 2
  state                region
  <chr>                <chr> 
1 District of Columbia <NA>  

Apply a fix for the NA in region, and calculate the mean sales taxes for regions again. Display the results in ascending order of mean sales tax.

sales_taxes_regions <- sales_taxes_regions |>
  mutate(
    region = if_else(state == "District of Columbia", "Northeast", region)
  )

sales_taxes_regions |>
  group_by(region) |>
  summarize(mean_state_tax = mean(state_tax_rate)) |>
  arrange(mean_state_tax)
# A tibble: 4 × 2
  region    mean_state_tax
  <chr>              <dbl>
1 West              0.0416
2 South             0.0523
3 Northeast         0.0537
4 Midwest           0.0569

Render, commit, and push

  1. Render your Quarto document.

  2. Go to the Git pane and check the box next to each file listed, i.e., stage your changes. Commit your staged changes using a simple and informative message.

  3. Click on push (the green arrow) to push your changes to your application exercise repo on GitHub.

  4. Go to your repo on GitHub and confirm that you can see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!