Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. sentenceSplitter. bool sentenceSplitter( string& fname, vector& sentences); T

ID: 3739923 • Letter: 1

Question

1. sentenceSplitter. bool sentenceSplitter( string& fname, vector& sentences);

This function converts a text file with the name fname into a list of sentences. The list of sentences will be stored in the sentences vector in the same order as it appears in the input file. This function returns true if it is successful; false otherwise. What will be considered as sentence delimiters? Given a paragraph of multiple sentences, the following punctuations will be used to split this paragraph into individual sentences period: ., question mark: ? period + double quotation mark: ." question mark + double quotation mark: ?"

Assume the input file contains the following three paragraphs:

The first story is about connecting the dots. I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out? It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him?" They said: "Of course." My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school. She refused to sign the final adoption papers. She only relented a few months later when my parents promised that I would someday go to college.

The above function will identify a total of 12 sentences as follows:

- The first story is about connecting the dots

- I dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit

- So why did I drop out

- It started before I was born

- My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption

- She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife

- Except that when I popped out they decided at the last minute that they really wanted a girl

- So my parents, who were on a waiting list, got a call in the middle of the night asking: "We have an unexpected baby boy; do you want him

- They said: "Of course

- My biological mother later found out that my mother had never graduated from college and that my father had never graduated from high school

- She refused to sign the final adoption papers

- She only relented a few months later when my parents promised that I would someday go to college

Explanation / Answer

#include <cstdlib>
#include <iostream>
#include "wordPairs.h"

using namespace std;

int main(int argc, char** argv) {
    // Setup filename string
    std::string file = "input.txt";
    // Declare out vector
    std::vector<std::string> sentences;
    // Fire off sentence splitter
    bool res = sentenceSplitter(file, sentences);
    // Output results...
    if (!res) {
        std::cout << "sentenceSplitter returned false" << std::endl;
    }
    else {
        std::cout << "SentenceSplitter returned true!" << std::endl;
        int x = 0;
        for (auto i : sentences) {
            x++;
            std::cout << "-" << i << std::endl;
        }
        std::cout << "Found " << x << " sentences... " << std::endl;
    }
    return 0;
}
--------------------------------------------------------------------------------------
//wordPairs.cpp
#include <iostream>
#include <fstream>
#include <sstream>
#include <regex>

#include "wordPairs.h"

using namespace std;

// Sentence Splitter
bool sentenceSplitter (string& fname, vector<string>& sentences) {
    // Setup file handle
    ifstream f(fname.c_str());
    // Check if file exists
    if (!f.good()) {
        std::cout << "Unable to locate file: '" << fname << "'. Quitting!" << std::endl;
        return false;
    }
    // Read into buffer
    stringstream buffer;
    buffer << f.rdbuf();
    // Convert to std::string
    string text = buffer.str();
    // remove "newline" chars
    text.erase(remove(text.begin(), text.end(), ' '), text.end());

    while(text.size()) {
        // Debugging
        //std::cout << "text.size() = " << text.size() << std::endl;

        // Build a map of our delimiter location, and delimiter for the remaining text
        std::map<size_t, std::string> delimeters;
        delimeters[text.find(".")] = ".";
        delimeters[text.find("?")] = "?";
        delimeters[text.find("."")] = "."";
        delimeters[text.find("?"")] = "?"";

        // Assign the delimeter position to index
        size_t index = delimeters.begin()->first;
        // Assign the delimeter to delim
        std::string delim = delimeters.begin()->second;

        // Debug
        //std::cout << index << " : " << delim << std::endl;

        if (index != std::string::npos) {
            // Push the beginning of the sentence to the vector
            sentences.push_back(text.substr(0, index));
            // Kill any whitespace we had left over, and remove the section from the main text body
            text = removeWhitespace(text.substr(index+delim.size()));
        }
    }
    return true;
}

// Identify word-pairs and calculate their frequencies
bool wordpairMapping (vector<string>& sentences,map<pair<string,string>, int> &wordpairFreq_map) {
    return 0;
}

// Flip the map of a multimap of to order all the work-pairs in ascending order of frequency
bool freqWordpairMmap (map<pair<string, string>, int> &wordpairFreq_map, multimap<int,pair<string,string> > &freqWordPair_mmap) {
    return 0;
}
// output the most frequent and least frequest word-pairs to a file
void printWordpairs (multimap<int, pair<string, string> > &freqWordpair_multimap, string outFname, int topCnt, int botCnt) {
}

string removeWhitespace(std::string str) {
    string newstr = std::regex_replace(str, std::regex("^ +| +$|( ) +"), "$1");
    return newstr;
}
------------------------------------------------------------------------
//wordPairs.h
#include <cstdlib>
#include <string>
#include <vector>
#include <map>

#ifndef FILEIOS_WORDPAIRS_H
#define FILEIOS_WORDPAIRS_H

// Sentence Splitter
bool sentenceSplitter(std::string& fname, std::vector<std::string>& sentences);
// Identify word-pairs in calculate their frequencies
bool wordpairMapping (std::vector<std::string>& sentences, std::map< std::pair<std::string,std::string>, int> &wordpairFreq_map);
// flip the map of a multimap of to order all the work-pairs in ascending order of frequency
bool freqWordpairMmap (std::map< std::pair<std::string, std::string>, int> &wordpairFreq_map, std::multimap<int, std::pair<std::string, std::string> > &freqWordPair_mmap);
// output the most frequent and least frequest word-pairs to a file
void printWordpairs (std::multimap<int, std::pair<std::string, std::string> > &freqWordpair_multimap, std::string outFname, int topCnt, int botCnt);

// Helper functions
std::string removeWhitespace(std::string str);

#endif /* FILEIOS_WORDPAIRS_H */