Lecture 12
Duke University
STA 199 - Fall 2024
October 10, 2024
Prepare for today’s application exercise: ae-11-un-votes-revisit-chatgpt
Go to your ae
project in RStudio.
Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
Click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Wait till the you’re prompted to work on the application exercise during class before editing the file.
Midsemester course evaluation due at midnight tonight (optional) – there will be some time at the end of class today to fill them out
Peer evaluation 1 due by 5:00 pm next Friday (Oct 18) – look for an email from TEAMMATES
Readings and videos for next Thursday’s class posted on the course website
One way to look at smells is with respect to principles and quality: “Smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality”. Code smells are usually not bugs; they are not technically incorrect and do not prevent the program from functioning. Instead, they indicate weaknesses in design that may slow down development or increase the risk of bugs or failures in the future.
Follow the Tidyverse style guide:
Spaces before and line breaks after each +
when building a ggplot
Spaces before and line breaks after each |>
in a data transformation pipeline,
Proper indentation
Spaces around =
signs and spaces after commas
Lines should not span more than 80 characters, long lines should be broken up with each argument on its own line
Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Work through Part 1 of the application exercise in class, and render, commit, and push your edits by the end of class.
How are you using Chat GPT or similar AI tools for this class?
SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”
Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.
Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.
What is wrong with this prompt? How would you correct it?
SurveyUSA polled 900 NC adults between September 4-7, 2024. Of the 900 NC adults, 771 were identified by SurveyUSA as being registered to vote. The following question was asked to these 771 adults: “Should the use of marijuana for medical use remain against the law in North Carolina? Or be legalized?”
Of the 771 responses, 391 were between the ages of 18-49. Of the individuals that are between 18-49, 59 individuals responded that they think medical marijuana should remain against the law, 292 said it should be made legal, and the remainder were not sure. Of the individuals that are 50+, 67 individuals responded that they think medical marijuana should remain against the law, 245 said it should be made legal, and the remainder were not sure.
Calculate the proportions of 18-49 year olds and 50+ year-olds in this sample. Make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.
Identify at least one mistake or code smell/style issue with the code below.
Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-11-un-votes-revisit-chatgpt.qmd.
Work through Part 2 of the application exercise in class, and render, commit, and push your edits by the end of class.
https://chatgpt.com/share/6707d8c3-98c4-8000-9732-e65bc5be916b
Each CSI franchise includes a number of series, each with the character string "CSI"
(in capital letters) in their title. Using this information, create a new data frame called csi
that only has titles from the CSI-verse and all of the columns from tv
for these titles. Then, display the first 10 rows of csi
. Note: The titles and seasons included in the dataset may not be all CSI titles and titles that have ever aired.
Each CSI franchise includes a number of series, each with the character string
"CSI"
(in capital letters) in their title. Using this information, create a new data frame calledcsi
that only has titles from the CSI-verse and all of the columns fromtv
for these titles. Then, display the first 10 rows ofcsi
. Note: The titles and seasons included in the dataset may not be all CSI titles and titles that have ever aired.
Certainly! To filter the
tv
data frame to include only titles from the CSI-verse, you can use thefilter
function from thedplyr
package. Here’s the code:library(dplyr) # Assuming your dataset is named 'tv' csi <- tv %>% filter(str_detect(title, "CSI", case_insensitive = TRUE)) # Display the first 10 rows of the 'csi' data frame head(csi, 10)
This code uses the
filter
function to select rows where thetitle
column contains the string “CSI” (case-insensitive). The resulting data frame,csi
, will include only titles from the CSI-verse along with all the columns from the originaltv
data frame. Make sure to replace ‘tv’ with the actual name of your dataset.
What is wrong with the following code?
str_detect doesn’t have a case_insensitive argument, can you try again?
You are correct, and I appreciate your understanding. The
str_detect
function itself doesn’t have acase_insensitive
argument. Instead, you can use thestr_which
function along with thecase_insensitive
argument in thefilter
function. Here’s the corrected code:library(dplyr) # Assuming your dataset is named 'tv' csi <- tv %>% filter(str_detect(title, "CSI", ignore_case = TRUE)) # Display the first 10 rows of the 'csi' data frame head(csi, 10)
This code uses
str_detect
within thefilter
function and includes theignore_case = TRUE
argument to perform a case-insensitive search. The resultingcsi
data frame will contain only titles from the CSI-verse along with all the columns from the originaltv
data frame. Make sure to replace ‘tv’ with the actual name of your dataset.
str_detect doesn’t have an ignore_case argument either, can you try again?
I apologize for the oversight. It seems there was a mistake in my responses. To perform a case-insensitive search, you can use the
stringi
package along withstri_detect_regex
. Here’s the corrected code:library(dplyr) library(stringi) # Assuming your dataset is named 'tv' csi <- tv %>% filter(stri_detect_regex(title, "CSI", case_insensitive = TRUE)) # Display the first 10 rows of the 'csi' data frame head(csi, 10)
This code uses
stri_detect_regex
from thestringi
package within thefilter
function, and it includes thecase_insensitive = TRUE
argument for a case-insensitive search. The resultingcsi
data frame will contain only titles from the CSI-verse along with all the columns from the originaltv
data frame. Make sure to replace ‘tv’ with the actual name of your dataset.
and edit it a bit:
|>
, use the tidyverse style, do not use unnecessary print()
statements, be critical of code smell, etc.Take a few minutes to fill it out on Canvas > Quizzes
Anonymous
Optional
Much appreciated!
And good luck with the rest of the take-home midterm! ♥️