Working with Generative AI tools
Lecture 12
Warm-up
While you wait…
Prepare for today’s application exercise: ae-11-un-votes-revisit-chatgpt
Go to your
ae
project in RStudio.Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
Click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Wait till the you’re prompted to work on the application exercise during class before editing the file.
Announcements
- Midterm take-home due at 5:00 pm tomorrow (Friday, Oct 11)
- Upload to Gradescope and
- Commit and push to your GitHub repo
. . .
Midsemester course evaluation due at midnight tonight (optional) – there will be some time at the end of class today to fill them out
Peer evaluation 1 due by 5:00 pm next Friday (Oct 18) – look for an email from TEAMMATES
Readings and videos for next Thursday’s class posted on the course website
Code smell and style
Code smell
One way to look at smells is with respect to principles and quality: “Smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality”. Code smells are usually not bugs; they are not technically incorrect and do not prevent the program from functioning. Instead, they indicate weaknesses in design that may slow down development or increase the risk of bugs or failures in the future.
Source: Code smell on Wikipedia
Code style
Follow the Tidyverse style guide:
Spaces before and line breaks after each
+
when building a ggplotSpaces before and line breaks after each
|>
in a data transformation pipeline,Proper indentation
Spaces around
=
signs and spaces after commasLines should not span more than 80 characters, long lines should be broken up with each argument on its own line
ae-11-un-votes-revisit-chatgpt
Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Work through Part 1 of the application exercise in class, and render, commit, and push your edits by the end of class.
The 🐘 in the room: Chat GPT (and friends)
Take a pulse
How are you using Chat GPT or similar AI tools for this class?
Example 1 - From Lab 4
SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”
Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.
Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.
Chat GPT - Take 1 prompt
What is wrong with this prompt? How would you correct it?
SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”
Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.
Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. Make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.
Chat GPT - Take 1 response
Identify at least one mistake or code smell/style issue with the code below.
ae-11-un-votes-revisit-chatgpt
Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Work through Part 2 of the application exercise in class, and render, commit, and push your edits by the end of class.
Let’s follow along with Chat GPT…
https://chatgpt.com/share/6707d8c3-98c4-8000-9732-e65bc5be916b
Example 2
Each CSI franchise includes a number of series, each with the character string "CSI"
(in capital letters) in their title. Using this information, create a new data frame called csi
that only has titles from the CSI-verse and all of the columns from tv
for these titles. Then, display the first 10 rows of csi
. Note: The titles and seasons included in the dataset may not be all CSI titles and titles that have ever aired.
Chat GPT - Take 1 prompt
Each CSI franchise includes a number of series, each with the character string
"CSI"
(in capital letters) in their title. Using this information, create a new data frame calledcsi
that only has titles from the CSI-verse and all of the columns fromtv
for these titles. Then, display the first 10 rows ofcsi
. Note: The titles and seasons included in the dataset may not be all CSI titles and titles that have ever aired.
Chat GPT - Take 1 response
Certainly! To filter the
tv
data frame to include only titles from the CSI-verse, you can use thefilter
function from thedplyr
package. Here’s the code:library(dplyr) # Assuming your dataset is named 'tv' csi <- tv %>% filter(str_detect(title, "CSI", case_insensitive = TRUE)) # Display the first 10 rows of the 'csi' data frame head(csi, 10)
This code uses the
filter
function to select rows where thetitle
column contains the string “CSI” (case-insensitive). The resulting data frame,csi
, will include only titles from the CSI-verse along with all the columns from the originaltv
data frame. Make sure to replace ‘tv’ with the actual name of your dataset.
Chat GPT - Take 1 code
What is wrong with the following code?
Chat GPT - Take 2 prompt
str_detect doesn’t have a case_insensitive argument, can you try again?
Chat GPT - Take 2 response
You are correct, and I appreciate your understanding. The
str_detect
function itself doesn’t have acase_insensitive
argument. Instead, you can use thestr_which
function along with thecase_insensitive
argument in thefilter
function. Here’s the corrected code:library(dplyr) # Assuming your dataset is named 'tv' csi <- tv %>% filter(str_detect(title, "CSI", ignore_case = TRUE)) # Display the first 10 rows of the 'csi' data frame head(csi, 10)
This code uses
str_detect
within thefilter
function and includes theignore_case = TRUE
argument to perform a case-insensitive search. The resultingcsi
data frame will contain only titles from the CSI-verse along with all the columns from the originaltv
data frame. Make sure to replace ‘tv’ with the actual name of your dataset.
Chat GPT - Take 2 code
Chat GPT - Take 3 prompt
str_detect doesn’t have an ignore_case argument either, can you try again?
Chat GPT - Take 3 response
I apologize for the oversight. It seems there was a mistake in my responses. To perform a case-insensitive search, you can use the
stringi
package along withstri_detect_regex
. Here’s the corrected code:library(dplyr) library(stringi) # Assuming your dataset is named 'tv' csi <- tv %>% filter(stri_detect_regex(title, "CSI", case_insensitive = TRUE)) # Display the first 10 rows of the 'csi' data frame head(csi, 10)
This code uses
stri_detect_regex
from thestringi
package within thefilter
function, and it includes thecase_insensitive = TRUE
argument for a case-insensitive search. The resultingcsi
data frame will contain only titles from the CSI-verse along with all the columns from the originaltv
data frame. Make sure to replace ‘tv’ with the actual name of your dataset.
Let’s go back to Take 1
and edit it a bit:
Let’s go back to Take 1
Guidelines and best practices for using Chat GPT
- Do not just copy-paste the prompt – for appropriate academic conduct, for your own learning, and for getting to better results faster
- Engineer the prompt until the response starts to look like code you’re learning in the course
- If the response is not correct, ask for a correction
- If the response doesn’t follow the guidelines, ask for a correction
- When you ask for a correction, be ready for it to “correct” (i.e., change) more than you asked for
- Do not just copy-paste code from Chat GPT responses, run it line-by-line and edit as needed – Chat GPT does not execute R code so there’s no guarantee any of the code it gives you works
- Watch out for clear mistakes in the response: do not keep loading packages that are already loaded, use the base pipe
|>
, use the tidyverse style, do not use unnecessaryprint()
statements, be critical of code smell, etc.
Time permitting
Midsemester course evaluation
Take a few minutes to fill it out on Canvas > Quizzes
Anonymous
Optional
Much appreciated!
And good luck with the rest of the take-home midterm! ♥️