Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A large number of insurance records are to be examined to develop a model for pr

ID: 3260206 • Letter: A

Question

A large number of insurance records are to be examined to develop a model for predicting fraudulent claims. Of the claims in the historical database, 1% was judged to be fraudulent (class 1).

A sample database is taken to develop a model, and oversampling is used to provide a balanced sample in light of the very low response rate. When applied to this sample database (total number of records, N = 800), the model ends up correctly classifying 310 frauds, and 270 non-frauds. It misses 90 frauds and classified 130 records incorrectly as frauds when they were not.

If the positive sample number is fixed (400), the sample ratio is 1:99 (fraudulent vs. non-fraudulent, positive vs. negative)

1. What is the adjusted misclassification rate (error rate) that should be in the original non-oversampled database?

2. What is the total number of false positive records that should be in the original non-oversampled database?

Explanation / Answer

a) Out of 800 samples, 220 were misclassified.

misclassification rate = 220/800 = 0.275 = 27.5%

b) Total number of false positive records = 90

Frauds Non-Frauds Total correctly classify 310 270 580 Incorrectly classify 90 130 220 Total 400 400 800
Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote