Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

R studio \' Logistic Regression \' This is my doctor\'s question >> (((Investiga

ID: 3815239 • Letter: R

Question

R studio ' Logistic Regression ' This is my doctor's question >> (((Investigate the use of Logistic Regression on a subset of the Kaggle Credit Card Fraud Data set (www.kaggle.com/dalpozz/creditcardfraud). Note that in this data set, the number of fraud data are much smaller than the normal data. Your first task would be to construct subset data set(s) from the Kaggle data set. Construct three subset data sets of 100K, 20K, and 10K, with normal and fraud data included (make sure you maximize the number of fraud data elements). Out of this data set construct a training data set and a testing data set (using 80% of the data for the former, and 20% for the latter) to build and test the logistic regression model. Tasks: Perform Logistic Regression on the three data subsets (100K, 20K, 10K). Show your results using a cross-table. Discuss your results for each of the data sets. Perform Ridge Logistic Regression and Lasso Logistic Regression on the three data subsets.

Explanation / Answer

what he wants is , there are normal and fraud data on the kaggle site.
You have to first construct the three data sets which contains the data size of 100k ,20k and 10k repectively
You can do that using subset function present in the R.

after that you need to create train and test test from those three data sets.
You can do by this:
s1 <- sample(which(df$IsActive==1),5000)
s2 <- sample(which(df$IsActive==0),25000)
train <- df[c(s1,s2),]
test <- df[c(-s1,-s2),]

Later on you need to perform the logistic regression on the data set. Logistic regression is the
classification technique in the machine learning.

after applying logistic regression on the 3 data set you need to put the results in the cross table.

Hope this helps you.
let me know if you need further need in terms of code and explanation.