Working with Generative AI tools

Lecture 12

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2024

October 10, 2024

Warm-up

While you wait…

Prepare for today’s application exercise: ae-11-un-votes-revisit-chatgpt

Go to your ae project in RStudio.
Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
Click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Wait till the you’re prompted to work on the application exercise during class before editing the file.

Announcements

Midterm take-home due at 5:00 pm tomorrow (Friday, Oct 11)
- Upload to Gradescope and
- Commit and push to your GitHub repo

Midsemester course evaluation due at midnight tonight (optional) – there will be some time at the end of class today to fill them out
Peer evaluation 1 due by 5:00 pm next Friday (Oct 18) – look for an email from TEAMMATES
Readings and videos for next Thursday’s class posted on the course website

Code smell and style

Code smell

One way to look at smells is with respect to principles and quality: “Smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality”. Code smells are usually not bugs; they are not technically incorrect and do not prevent the program from functioning. Instead, they indicate weaknesses in design that may slow down development or increase the risk of bugs or failures in the future.

Code style

Follow the Tidyverse style guide:

Spaces before and line breaks after each + when building a ggplot
Spaces before and line breaks after each |> in a data transformation pipeline,
Proper indentation
Spaces around = signs and spaces after commas
Lines should not span more than 80 characters, long lines should be broken up with each argument on its own line

ae-11-un-votes-revisit-chatgpt

Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Work through Part 1 of the application exercise in class, and render, commit, and push your edits by the end of class.

The 🐘 in the room: Chat GPT (and friends)

Take a pulse

How are you using Chat GPT or similar AI tools for this class?

Example 1 - From Lab 4

SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”

Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.

Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.

Chat GPT - Take 1 prompt

What is wrong with this prompt? How would you correct it?

SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”

Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.

Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. Make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.

Chat GPT - Take 1 response

Identify at least one mistake or code smell/style issue with the code below.

ae-11-un-votes-revisit-chatgpt

Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Work through Part 2 of the application exercise in class, and render, commit, and push your edits by the end of class.

Let’s follow along with Chat GPT…

https://chatgpt.com/share/6707d8c3-98c4-8000-9732-e65bc5be916b

Example 2

Each CSI franchise includes a number of series, each with the character string "CSI" (in capital letters) in their title. Using this information, create a new data frame called csi that only has titles from the CSI-verse and all of the columns from tv for these titles. Then, display the first 10 rows of csi. Note: The titles and seasons included in the dataset may not be all CSI titles and titles that have ever aired.

Chat GPT - Take 1 prompt

Each CSI franchise includes a number of series, each with the character string "CSI" (in capital letters) in their title. Using this information, create a new data frame called csi that only has titles from the CSI-verse and all of the columns from tv for these titles. Then, display the first 10 rows of csi. Note: The titles and seasons included in the dataset may not be all CSI titles and titles that have ever aired.

Chat GPT - Take 1 response

Certainly! To filter the tv data frame to include only titles from the CSI-verse, you can use the filter function from the dplyr package. Here’s the code:
library(dplyr)

# Assuming your dataset is named 'tv'
csi <- tv %>%
  filter(str_detect(title, "CSI", case_insensitive = TRUE))

# Display the first 10 rows of the 'csi' data frame
head(csi, 10)
This code uses the filter function to select rows where the title column contains the string “CSI” (case-insensitive). The resulting data frame, csi, will include only titles from the CSI-verse along with all the columns from the original tv data frame. Make sure to replace ‘tv’ with the actual name of your dataset.

Chat GPT - Take 1 code

What is wrong with the following code?

Chat GPT - Take 2 prompt

str_detect doesn’t have a case_insensitive argument, can you try again?

Chat GPT - Take 2 response

You are correct, and I appreciate your understanding. The str_detect function itself doesn’t have a case_insensitive argument. Instead, you can use the str_which function along with the case_insensitive argument in the filter function. Here’s the corrected code:
library(dplyr)

# Assuming your dataset is named 'tv'
csi <- tv %>%
  filter(str_detect(title, "CSI", ignore_case = TRUE))

# Display the first 10 rows of the 'csi' data frame
head(csi, 10)
This code uses str_detect within the filter function and includes the ignore_case = TRUE argument to perform a case-insensitive search. The resulting csi data frame will contain only titles from the CSI-verse along with all the columns from the original tv data frame. Make sure to replace ‘tv’ with the actual name of your dataset.

Chat GPT - Take 2 code

Chat GPT - Take 3 prompt

str_detect doesn’t have an ignore_case argument either, can you try again?

Chat GPT - Take 3 response

I apologize for the oversight. It seems there was a mistake in my responses. To perform a case-insensitive search, you can use the stringi package along with stri_detect_regex. Here’s the corrected code:
library(dplyr)
library(stringi)

# Assuming your dataset is named 'tv'
csi <- tv %>%
  filter(stri_detect_regex(title, "CSI", case_insensitive = TRUE))

# Display the first 10 rows of the 'csi' data frame
head(csi, 10)
This code uses stri_detect_regex from the stringi package within the filter function, and it includes the case_insensitive = TRUE argument for a case-insensitive search. The resulting csi data frame will contain only titles from the CSI-verse along with all the columns from the original tv data frame. Make sure to replace ‘tv’ with the actual name of your dataset.

Let’s go back to Take 1

and edit it a bit:

Let’s go back to Take 1

Guidelines and best practices for using Chat GPT

Do not just copy-paste the prompt – for appropriate academic conduct, for your own learning, and for getting to better results faster
Engineer the prompt until the response starts to look like code you’re learning in the course
If the response is not correct, ask for a correction
If the response doesn’t follow the guidelines, ask for a correction
When you ask for a correction, be ready for it to “correct” (i.e., change) more than you asked for
Do not just copy-paste code from Chat GPT responses, run it line-by-line and edit as needed – Chat GPT does not execute R code so there’s no guarantee any of the code it gives you works
Watch out for clear mistakes in the response: do not keep loading packages that are already loaded, use the base pipe |>, use the tidyverse style, do not use unnecessary print() statements, be critical of code smell, etc.

Time permitting

Midsemester course evaluation

Take a few minutes to fill it out on Canvas > Quizzes

Anonymous
Optional
Much appreciated!

And good luck with the rest of the take-home midterm! ♥️