NOTE: I RECOMMEND YOU TO DO THIS ASSIGNMENT ON VISUAL STUDIO SINCE I HAVE NEVER
ID: 3725274 • Letter: N
Question
NOTE: I RECOMMEND YOU TO DO THIS ASSIGNMENT ON VISUAL STUDIO
SINCE I HAVE NEVER TAKING C++, SO I MAY HAVE SOME DIFFICULTY FOR THIS ASSIGNMENT. THEREFORE, I HOPE YOU CAN DO THIS ON MICROSOFT VISUAL STUDIO.
BELOW IS ONE OF THE SAMPLES TO TEST THE PROGRAM ON, WHICH IS "emailx.txt" WITH THE FORMAT AS ".txt", WITH AN x AS THE NUMBER(1,2,3...).
I JUST POST ONE OF THEM SINCE THERE ARE STILL MORE , SO I POST IT HERE IN ORDER FOR YOU TO HAVE AN IDEA TO WRITE THE CODE ON THIS.
SAMPLE .TXT
dear all,
on behalf of thrill company, i am glad to invite you for a luncheon
party with all the senior employees, team members, and other staff
members associated with the company. since according to association’s
policies, we have five working days, therefore we have planned to set
a lunch party for saturday, 13th january 2012.
please mark your presence on this party. together, we would get an
opportunity to interact with our boss, expand our contacts, learn more
about our field and of course eat some mouth watering dishes. this
luncheon would be held in a new york cafe situated at park
lane. kindly be present by 12:00 noon so that your taste buds do not
miss any of the tempting dishes being served!
i request you to confirm your presence latest by wednesday, 9th
january 2012 so that we make appropriate bookings.
looking forward to see you on this thrill’s luncheon party!
sincerely,
jacob thomas
hr head
thrill company
PROMPT
Lab 2:Spam Filter In this lab, you will implement part of a naive Bayes' spam classifier. To illustrate how this filter works, consider the following email: Hey! This is the best link I found. I thought you would want to see it! www.somelink. com/example Best, Sus We want to classify this email as either spam or not spam. Typically, the filter will consider the emails. For our filter, we will entire email and look for multiple words that are common in spam consider a single word For this example, we will classify the email based on the word "best". Assume the probability that any particular email is spam is 0.25, and the probability that any particular email is not spam is 0.75 To classify the mystery email (above), we want to compute the probability that this email is spam given that it contains the word "best". Then we want to compute the probability that this email is NOT spam given that it contains the word "best". We then classify based on which probability is higher Let's define a couple of variables 1. C: email contains the word "best" 2·T: email does NOT contain the word "best" 3. S: email is spam 4. S: email is NOT spam Hence, we want to compute P(SC) and P(SC) Computing the Probability of "best" First, we need to figure out how common "best" is in spam emails and how common "best" is in emails that are not spam. To do this, we have to use sample emails. This is called training data. For this example, we'll use the following emails. These are the sample spam emails 1. vou've been selected as a winner! click now to get the best anti-virus canner! 2. you're a winner! reply immediately to claim your access to the best weight loss system ever.Explanation / Answer
/*Hi, I have done most of the part in program. Final spam or not spam is remaining, you have to just do twiking in main function. Also take snapshot or pdf. Thanks*/
// FileTestPro.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int countwords(string fileName)
{
ifstream fin;
fin.open(fileName);
char word[30];
int count = 0;
while (!fin.eof())
{
fin >> word;
count++;
}
cout << "Number of words in file are " << count << endl;
fin.close();
return count;
}
int Keyword(string fileName, char* token)
{
ifstream fin;
fin.open(fileName);
char word[30];
int count = 0;
while (!fin.eof())
{
fin >> word;
if (!strcmp(word, token))
{
count++;
}
}
cout << "Number of matched words in file are " << count << endl;
fin.close();
return count;
}
int main()
{
int noOfWordsInFile = countwords("email1.txt");
char token[10] = "best";
int noOfMatchWordsInFile = Keyword("email1.txt",token);
cout << "Probability that word occurs given the email is spam P(C|S)::" << noOfMatchWordsInFile << "/" << noOfWordsInFile << endl;
//int noOfWordsInFile2 = countwords("email3.txt");
int noOfWordsInFile2 = countwords("email2.txt");
char token2[10] = "best";
int noOfMatchWordsInFile2 = Keyword("email2.txt", token2);
cout << "Probability that word occurs given the email is spam P(C|S)::" << noOfMatchWordsInFile2 << "/" << noOfWordsInFile2 << endl;
double pc = (cout.precision(noOfMatchWordsInFile / noOfWordsInFile) * 0.25) + (cout.precision(noOfMatchWordsInFile2 / noOfWordsInFile2) * 0.75);
cout << "P(C)=" << pc << endl;
// --------------------------------------------------
int noOfWordsInFile3 = countwords("email3.txt");
char token3[10] = "best";
int noOfMatchWordsInFile3 = Keyword("email3.txt", token3);
cout << "Probability that word occurs given the email is not spam P(C|S)::" << noOfMatchWordsInFile3 << "/" << noOfWordsInFile3 << endl;
int noOfWordsInFile4 = countwords("email4.txt");
char token4[10] = "friday";
int noOfMatchWordsInFile4 = Keyword("email4.txt", token4);
cout << "Probability that word occurs given the email is not spam P(C|S)::" << noOfMatchWordsInFile4 << "/" << noOfWordsInFile4 << endl;
double pc1 = (cout.precision(noOfMatchWordsInFile3 / noOfWordsInFile3) * 0.25) + (cout.precision(noOfMatchWordsInFile4 / noOfWordsInFile4) * 0.75);
cout << "P(C)=" << pc1 << endl;
return 0;
}
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.