For example: spam Congrats! 1 year special cinema pass for 2 is yours. call 0906
ID: 3675517 • Letter: F
Question
For example:
spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C
ham Sorry, I'll call later in meeting.
At the end, your program must print summary information and information about the most frequent words in spam messages and the most frequent words in non-spam (ham) messages.
To accomplish this, you analyzeSMSes should:
read all the data from the input file
extract individual words from the messages. This should include an effort to get ride of "extras" such as periods, commas, question and exclamation marks, and other characters that aren't part of a word. You should probably also ignore capitalization. Thus in the sample spam message above, you probably want to treat "Congrats!" as "congrats" in your frequency analysis.
build two dictionaries, one for frequencies of words appearing in spam messages, one for frequencies of words from ham messages.
print summary information and some word frequency information about the data.
Again, it is up to you to decide exactly what to print. Summary information might include the number of spam/non-spam messages, the total number of different words in spam and non-spam messages, the total number of words in each, and anything else that might be interesting (does spam or non-spam have longer average word length??). Frequency information might be in the form of the top ten most used words in spam and in non-spam, along with a measure of their frequency (is a good measure absolute count of occurrences? Or might it be better as fraction/percentage of all occurrences in that type of message).
Explanation / Answer
#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <algorithm>
#include <map>
#include <string>
using namespace std;
class WordCounter
{
public:
int value;
WordCounter() : value( 0 ) {}
void operator++ (int) { value++; }
};
ostream& operator<<(ostream& st, WordCounter& wc )
{
return st << wc.value;
}
// Remove unwanted characters from a string
bool filter(char c)
{
return isalpha( c ) == 0;
}
const string path = "SMSSpamCollection.txt";
//const string path = "/home/andy/NetBeansProjects/WordCount/Hamlet.txt"; //Linux
int main()
{
map<string, WordCounter> counter;
ifstream input;
input.open( path.c_str() );
if ( !input )
{
cout << "Error in opening file Hamlet.txt ";
return 0;
}
string tok;
while ( true )
{
input >> tok;
// Remove ?, !, , characters etc from string
tok.resize( remove_if( tok.begin(), tok.end(), filter) - tok.begin() );
if ( input )
{
count
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.