A large number of insurance records are to be examined to develop a model for pr
ID: 3260206 • Letter: A
Question
A large number of insurance records are to be examined to develop a model for predicting fraudulent claims. Of the claims in the historical database, 1% was judged to be fraudulent (class 1).
A sample database is taken to develop a model, and oversampling is used to provide a balanced sample in light of the very low response rate. When applied to this sample database (total number of records, N = 800), the model ends up correctly classifying 310 frauds, and 270 non-frauds. It misses 90 frauds and classified 130 records incorrectly as frauds when they were not.
If the positive sample number is fixed (400), the sample ratio is 1:99 (fraudulent vs. non-fraudulent, positive vs. negative)
1. What is the adjusted misclassification rate (error rate) that should be in the original non-oversampled database?
2. What is the total number of false positive records that should be in the original non-oversampled database?
Explanation / Answer
a) Out of 800 samples, 220 were misclassified.
misclassification rate = 220/800 = 0.275 = 27.5%
b) Total number of false positive records = 90
Frauds Non-Frauds Total correctly classify 310 270 580 Incorrectly classify 90 130 220 Total 400 400 800Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.