R problem opiod <- read.csv(\"https://data.ct.gov/api/views/rybz-nyjw/rows.csv?a
ID: 3756590 • Letter: R
Question
R problem
opiod <- read.csv("https://data.ct.gov/api/views/rybz-nyjw/rows.csv?accessType=DOWNLOAD", stringsAsFactors = FALSE)
The function table builds a contingency table of the counts at each combination of factor levels. Try it out with table(opiod_dem$Sex). You should notice that for 4 of the deaths a value was not provided in the data for Sex. In the data set empty values are as “”.
Subset opiod_dem to remove the rows where the variable Sex has “” as a value. Save the resulting data frame as opiod_dem_filter.
Subset opiod_dem_filter to remove the rows where the variable Race has “” as a value. Save the resulting data frame as opiod_dem_filter.
Use the function table to build a table for the variable Sex from opiod_dem_filter.
Use the function table to create a two way contingency table for the variables Race and Sex from opiod_dem_filter. Save the resulting table as an object named dem.
Turn dem into a data frame with the following code: dem_df <- data.frame(female = dem[, 1], male = dem[, 2]).
Add a column named sums to dem_df that is the row sum.
From 6, calculate which race has the highest percentage of female deaths, and which race has the highest percentage of male deaths.
Repeat the calculation in 6, but only for races with a minimum of 100 deaths.
Explanation / Answer
###Library for doing operations##
library(dplyr)
###Read the data##
opiod_dem <- read.csv("https://data.ct.gov/api/views/rybz-nyjw/rows.csv?accessType=DOWNLOAD", stringsAsFactors = FALSE)
###Create table for variable named Sex##
table(opiod_dem$Sex)
###Filter the data and get values where Sex is unequal to ""##
opiod_dem_filter<-filter(opiod_dem,Sex!="")
###Filter the data and get values where Race is unequal to ""##
opiod_dem_filter<-filter(opiod_dem_filter,Race!="")
###Create table for variable named Sex in new dataframe##
table(opiod_dem_filter$Sex)
###Create table for Race and sex##
dem<-table(opiod_dem_filter$Race,opiod_dem_filter$Sex)
###Convert table to dataframe##
dem_df <- data.frame(female = dem[, 1], male = dem[, 2])
###Make new column called sums containing both male and female deaths##
dem_df$sums<-dem_df$female+dem_df$male
###Find max female death percentage##
##Find the row number where female death/ Total death is highest by which.max##
##Then find that row number in dataframe and print##
dem_df[as.numeric(which.max((dem_df$female/dem_df$sums))),]
###Find max male death percentage##
##Find the row number where male death/ Total death is highest by which.max##
##Then find that row number in dataframe and print##
dem_df[as.numeric(which.max((dem_df$male/dem_df$sums))),]
###Do the similar procedure like above but just for those where deaths are greater>100##
##Subset dataframe to get rows where death is > 100##
dem_df100<-subset(dem_df,sums>=100)
dem_df100[as.numeric(which.max((dem_df100$female/dem_df100$sums))),]
###Same for male##
dem_df100[as.numeric(which.max((dem_df100$male/dem_df100$sums))),]
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.