I am in a class about programming in R for data science. A question on the homew
ID: 2927980 • Letter: I
Question
I am in a class about programming in R for data science. A question on the homework reads:
"Write a function that takes a data frame as an input and for each numeric variable in the data frame the function returns to the user the variables’ mean, median, and inter-quartile range in the form of a list. The list should have the names of the corresponding variables. Name your function df.numeric.summary. Test your function on the flint.lead.filter data frame. You should get the results you see below. It is not required to have any type of stopifnot statement in your function."
I don't know how to write this and I need help.
Explanation / Answer
First let us create a dummy data frame containing 4 numeric variables, 2 character variables, 2 boolean variables, 2 factor variables as below:
DummyData <- data.frame(A1=1:10,
A2=11:20,
A3=21:30,
A4=31:40,
A5=letters[1:10],
A6=letters[11:20],
A7=rep(TRUE,10),
A8=rep(FALSE,10),
A9=factor(sample(c("Type1","Type2"),10,replace=T)),
A10=factor(sample(c("Male","Female"),10,replace=T)))
str(DummyData)
'data.frame': 10 obs. of 10 variables:
$ A1 : int 1 2 3 4 5 6 7 8 9 10
$ A2 : int 11 12 13 14 15 16 17 18 19 20
$ A3 : int 21 22 23 24 25 26 27 28 29 30
$ A4 : int 31 32 33 34 35 36 37 38 39 40
$ A5 : Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
$ A6 : Factor w/ 10 levels "k","l","m","n",..: 1 2 3 4 5 6 7 8 9 10
$ A7 : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ A8 : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ A9 : Factor w/ 2 levels "Type1","Type2": 1 1 1 2 1 2 2 2 1 1
$ A10: Factor w/ 2 levels "Female","Male": 1 1 1 1 1 2 2 1 1 1
Then we can write the function definition as below. We can make use of mean, median and IQR functions for the required functionality.
df.numeric.summary <- function(df){
NumericVariables <- sapply(df, is.numeric)
df <- df[,NumericVariables]
Mean <- sapply(df,mean)
Median <- sapply(df, median)
IQR <- sapply(df, IQR)
Summary <- list(Mean,Median,IQR)
names(Summary) <- c("Mean","Median","IQR")
return(Summary)
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.