AE 11: UN Votes - Revisit + Chat GPT adventures

Application exercise

Part 1: UN Votes - Revisit

You’ve seen this analysis before. It’s time to revisit it and clean it up for code smell, style, and readability. Make the necessary updates to the code to improve the code. Then, review the diff before you render, commit (with an appropriate message), and push.

Introduction

How do various countries vote in the United Nations General Assembly, how have their voting patterns evolved throughout time, and how similarly or differently do they view certain issues? Answering these questions (at a high level) is the focus of this analysis.

Packages

We will use the tidyverse, lubridate, and scales packages for data wrangling and visualization, and the DT package for interactive display of tabular output, and the unvotes package for the data.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(scales)


Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

library(unvotes)

If you use data from the unvotes package, please cite the following:

Erik Voeten "Data and Analyses of Voting in the UN General Assembly" Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013)

library(ggthemes)

Data

The data we’re using originally come from the unvotes package. In the chunk below we modify the data by joining the various data frames provided in the package to help you get started with the analysis.

unvotes <- un_votes |>inner_join(un_roll_calls, by = join_by(rcid)) |>
  inner_join(un_roll_call_issues, 
by = join_by(rcid == rcid),relationship = "many-to-many")

print(unvotes)

# A tibble: 857,878 × 14
    rcid country country_code vote  session importantvote date       unres amend
   <dbl> <chr>   <chr>        <fct>   <dbl>         <int> <date>     <chr> <int>
 1     6 United… US           no          1             0 1946-01-04 R/1/…     0
 2     6 Canada  CA           no          1             0 1946-01-04 R/1/…     0
 3     6 Cuba    CU           yes         1             0 1946-01-04 R/1/…     0
 4     6 Domini… DO           abst…       1             0 1946-01-04 R/1/…     0
 5     6 Mexico  MX           yes         1             0 1946-01-04 R/1/…     0
 6     6 Guatem… GT           no          1             0 1946-01-04 R/1/…     0
 7     6 Hondur… HN           yes         1             0 1946-01-04 R/1/…     0
 8     6 El Sal… SV           abst…       1             0 1946-01-04 R/1/…     0
 9     6 Nicara… NI           yes         1             0 1946-01-04 R/1/…     0
10     6 Panama  PA           abst…       1             0 1946-01-04 R/1/…     0
# ℹ 857,868 more rows
# ℹ 5 more variables: para <int>, short <chr>, descr <chr>, short_name <chr>,
#   issue <fct>

UN voting patterns

Let’s create a data visualization that displays how the voting record of the United States, United Kingdom, and Turkey changed over time.

library(ggplot2)
unvotes |>
  filter(country%in%c("United Kingdom", "United States", "Turkey")) |>
    mutate(year = year(date))|>
      group_by(country, year, issue) |>
    summarize(percent_yes = mean(vote=="yes"),.groups = "drop") |>

  
ggplot(aes(x = year, 
           y = percent_yes, 
           color = country)) +
  geom_smooth(method = "loess", se = FALSE) + facet_wrap(~issue) +
              scale_color_colorblind() +
  labs(
    title = "Percentage of 'Yes' votes in the UN General Assembly",
subtitle = "1946 to 2019",y = "% Yes",x = "Year",color = "Country"
  )+
   geom_point(alpha = 0.4)+scale_y_continuous(labels = percent)

References

Robinson D (2021). unvotes: United Nations General Assembly Voting Data. R package version 0.3.0, https://github.com/dgrtwo/unvotes.
Erik Voeten “Data and Analyses of Voting in the UN General Assembly” Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013).
Much of the analysis has been modeled on the examples presented in the unvotes package vignette.

Part 2: ChatGPT adventures

Suppose you gave the following prompt THAT YOU SHOULDN’T HAVE IN THE FIRST PLACE to Chat GPT and it gave you the response provided below. Improve the code for correctness, code smell, style, and readability.

SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”

Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.

Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. Make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.

library(dplyr)

# Create the initial data frame with counts
survey_counts <- tibble( 
  age = c("18-49", "18-49", "18-49", "50+", "50+", "50+"),
  opinion = c("Against", "Legal", "Not sure", "Against", "Legal", "Not sure"),
  n = c(59, 292, 40, 67, 245, 59)
)

# Group by age and summarize total counts
age_totals <- survey_counts %>%
  group_by(age) %>%
  summarize(total_count = sum(n)) %>%
  ungroup()

# Calculate the overall total
overall_total <- sum(age_totals$total_count)

# Add the proportion column
age_totals <- age_totals %>%
  mutate(proportion = total_count / overall_total)

# Resulting data frame
print(age_totals)

# A tibble: 2 × 3
  age   total_count proportion
  <chr>       <dbl>      <dbl>
1 18-49         391      0.513
2 50+           371      0.487