Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In this long project, you are going to implement a word search engine using C++

ID: 3845314 • Letter: I

Question

In this long project, you are going to implement a word search engine using C++ and binary tree. The search engine would help users to count number of occurrence of a word among given set of text files and also sort all the words.

The input files are alreay given.

Upon execution, the program should load all the words from the input files in the memory. The program has two modes:

Mode 1: Word search

In this mode the program should run until the user explicitly specifies “exit”. The following functionality should be supported:

1. Count how many times a user-specified word appears in the files.

2. Display the name of input files and corresponding number of occurrences of the specified word.

3. Provide some stochastics such as: • Total number of occurrences of the specified word (wordTotal) • Total number of files that contain the word (fileTotal) • Average number of occurrences of the specified word per file (Average)

Mode 2: Word Sorting

In this mode the program should ask user to input the number of words and then print the existing words in all the input file in alphabetic order up to the input number. Upon execution of you program, you should specify as input parameter the path to a directory containing the input files.

Implementation requirements: First, you need to implement a class for an abstract data type, in which you are going to store files in memory. This step is very specific, depending on the functionality of the program you are developing. For the current implementation you are required to use Binary Tree.

You need to declare a class Word that would store a word, those properties and pointers and another class File that would store a file-name and number of times that the word occurred in this file. The process of “loading files in the memory” consists of (i) creating an object of type Word for each new word that occurs in the set of input files, (ii) appending this object to a binary tree and update stochastics , (iii) creating an object File for each occurrence of the word in a file and (iv) updating the corresponding (blue) linked list with the object File. Once you have the files loaded in such structure, the searching would be as easy as finding the queried word in the “green” binary tree and tracing in which files it occurs by going over the corresponding “blue” linked list.

You are required to split different aspects of this program and implement them in separate files. Then you will have to put them all back together to achieve the program functionality. You will need to create several .h and .cpp files for this project.

First you need to identify the important object-types and methods you will need for your implementation. In this project, your main object types are going to be class Word and class File. You can reuse your classes for File list. As before, create a file called itemtype.h and declare class File in it. Then create another file called itemtype.cpp and define your class File in it. These two files are going to be related to objects from the “blue” lists on the picture.

The next step is to implement the functionality for creating the “blue” lists. For this purpose, you need to create two more files – list.h and list.cpp. In list.h declare all the methods needed for building a “blue” list and in list.cpp define these methods. Now, you need to implement the functionality related to objects like these in the “green” binary tree which is different from the first project.

Create two more files – word.h and word.cpp, declare class Word and all methods to this class in word.h and define them in word.cpp. Now what's left is to put it all together by writing a main function that utilizes both file and word. To do so, create wordsearch.h and wordsearch.cpp, declare your main function in wordsearch.h and define it in wordsearch.cpp.

So all in all you will need eight files: wordsearch.h, wordsearch.cpp, itemtype.h, itemtype.cpp, list.h, list.cpp, word.h, word.cpp.

However I have everything but wordsearch.cpp and wordsearch.h included here.

itemtype.cpp

#ifndef ITEMTYPE
#define ITEMTYPE
#include "itemtype.h"
   file* file::getNext(){
       return this->next;
   }
   int file::getCount(){
       return this->count;
   }
   string file::getfilename(){
       return filename;
   }
   void file::setNext(file* next){
       this->next = next;
   }
   void file::incrementCount(){
       count++;
   }
#endif

itemtype.h

#ifndef FILE
#define FILE
#include <string>
using namespace std;

class file
{
public:
   file(string filename = "", file* next = NULL){
       this->filename = filename;
       this->count = 1;
       this->next = next;
   }
   file* getNext();
   int getCount();

   //Function to add +1 to the counter every time the word is repeated in a document.
   void incrementCount();
   //Function that returns the name of the specified file.
   string getfilename();
   //Function to set the next pointer node.
   void setNext(file* next);
private:
   file* next;
   string filename;
   int count;
};
#endif

list.cpp

#include "list.h"
#include <iostream>
using namespace std;
list::list(file* head_ptr){
   this->head_ptr = head_ptr;
}
void list::print(){
   file* current = head_ptr;
   while(current != NULL){
       int count = current-> getCount();
       string filename = current->getfilename();
       if(current->getNext() == NULL && current != head_ptr){
           cout << "and ";
       }
       if(count == 1){
           cout << count << " time in " << filename;
       }
       else{
           cout << count << " times in " << filename;
       }
       if(current->getNext() != NULL){
           cout<<", ";
       }
       else{
           cout<<".";
       }
       current = current->getNext();
   }
   cout << endl;
}
void list::createFront(string docname){
   file* insert = new file(docname,head_ptr);
   head_ptr = insert;
}

void list::insert(string docname){
   file* current = head_ptr;
   bool found = false;
   while(current != NULL && !found){
       if(current->getfilename() == docname){
           current->incrementCount();
           found = true;
       }
       current = current -> getNext();
   }
   if(!found){
       createFront(docname);
   }

}

list.h

#ifndef LIST
#define LIST
#include "itemtype.h"
class list{
public:
   list(file* head_ptr = NULL);

   //Function to insert a document into linked list.
   void insert(string docname);

   //Function to insert document into linked list if document not already in list.
   void createFront(string docname);

   //Function to print all items in the list
   void print();

   //Function to return the head of the list
   file* returnHead(){
       return head_ptr;
   }
private:
   file* head_ptr;
};
#endif

word.cpp

#include "word.h"
#include <iostream>
#include <cstdlib>
#include <string>
using namespace std;

word::word(string wordparam, word* left_ptr, word* right_ptr){
   this->wordparam = wordparam;
   this->left_ptr = left_ptr;
   this->right_ptr = right_ptr;
   this->files = new list();
}

void word::setRight(word* right_ptr){
   this->right_ptr = right_ptr;
}

void word::setLeft(word* left_ptr){
this->left_ptr = left_ptr;
}

string word::getWord(){
   return wordparam;
}

word* word::getRight(){
   return right_ptr;
}

word* word::getLeft(){
return left_ptr;
}

bool word::hasLeft(){
   return (getLeft() != NULL);
}

bool word::hasRight(){
   return (getRight() != NULL);
}

list* word::get_head(){
   return files;
}

word.h

#ifndef WORD_H
#define WORD_H
#include "list.h"
#include <iostream>
#include <cstdlib>

class word{
   public:
       word(string wordparam = "", word* left_ptr = NULL, word* right_ptr = NULL);

       //Function to set word.
       void setWord(string wordparam);

       //Function to set left_ptr to the left_ptr
       void setLeft(word* left_ptr);

       //Function to set right_ptr to the right_ptr
       void setRight(word* right_ptr);

       //Function to get the pointer to a list containing file nodes.
       list* get_head();

       //Function to access the specified word.
       string getWord();

       //Function to access the pointer for the right node in the list.
       word* getRight();

       //Function to access the pointer for the left node in the list.
       word* getLeft();

       //Function that returns true or false if this node has a left node.
       bool hasLeft();

       //Function that returns true or false if this node has a right node.
       bool hasRight();

   private:
       string wordparam;
       word* left_ptr;
       word* right_ptr;
       list* files;
};
#endif

Please give wordsearch.cpp and wordsearch.h

Now if you actually read through all of that, I congratulate you (and feel free to completely modify the project too).

Explanation / Answer

Here is the complete code for the question. Some files given by you are changed. Anyway giving here the full set of files. I could not get the mode 2 in the question. But the program gives a function to print sorted list of words from tree by travesing in in-order.

Please note, while searching for word, make sure the word is free from punctuation marks, otherwise its treated as a new word. For example, "program" is a different word from "program," . The second word comes with comma. The program is simple, meaning it just gets words as separated by spaces/tabs/newlines.

Please do rate the answer if it helped. Thanks

itemtype.h

----------


#include "itemtype.h"
file* file::getNext(){
return this->next;
}
int file::getCount(){
return this->count;
}
string file::getfilename(){
return filename;
}
void file::setNext(file* next){
this->next = next;
}
void file::incrementCount(){
count++;
}
  

  

itemtype.cpp

#ifndef FILE_H
#define FILE_H
#include <iostream>

using namespace std;
class file
{
public:
file(string filename , file* next = NULL){
this->filename = filename;
this->count = 1;
this->next = next;
}
file* getNext();
int getCount();
//Function to add +1 to the counter every time the word is repeated in a document.
void incrementCount();
//Function that returns the name of the specified file.
string getfilename();
//Function to set the next pointer node.
void setNext(file* next);
  
private:
file* next;
string filename;
int count;

};
#endif

list.cpp

#include "list.h"
#include <iostream>
using namespace std;
list::list(file* head_ptr){
this->head_ptr = head_ptr;
}
void list::print(){
file* current = head_ptr;
while(current != NULL){
int count = current-> getCount();
string filename = current->getfilename();
if(current->getNext() == NULL && current != head_ptr){
cout << "and ";
}
if(count == 1){
cout << count << " time in " << filename;
}
else{
cout << count << " times in " << filename;
}
if(current->getNext() != NULL){
cout<<", ";
}
else{
cout<<".";
}
current = current->getNext();
}
cout << endl;
}
void list::addnew(string docname){
file* insert = new file(docname,head_ptr);
head_ptr = insert;
size++;
}
void list::increment(string docname){
file* current = head_ptr;
bool found = false;
while(current != NULL && !found){
if(current->getfilename() == docname){
current->incrementCount();
found = true;
}
current = current -> getNext();
}
if(!found){
addnew(docname);
}
  
}

int list::getTotalCount()
{
   file* p=head_ptr;
   int total=0;
   while(p!=NULL)
   {
       total+=p->getCount();
       p=p->getNext();
   }  
   return total;
}

int list::getSize()
{
   return size;
}

list.h

#ifndef LIST
#define LIST
#include "itemtype.h"
class list{
public:
list(file* head_ptr = NULL);
//Function to increment the counter for a document in the list. If document was not presen, new one is added
void increment(string docname);
  
//Function to print all items in the list
void print();
//Function to return the head of the list
file* returnHead(){
return head_ptr;
}
  
int getTotalCount(); //return the sum of count of all files in list
int getSize(); //return the number of items in the list
~list()
{
   file* p=head_ptr,*temp;
   while(p!=NULL)
   {
       temp=p->getNext();
       delete p;
       p=temp;
       }
   }
private:
file* head_ptr;
int size;
void addnew(string docname); //function to add a new document in list
};
#endif

word.cpp

#ifndef WORD_H
#define WORD_H
#include "list.h"
#include <iostream>
#include <cstdlib>
class word{
public:
word(string wordparam = "", word* left_ptr = NULL, word* right_ptr = NULL);
//Function to set word.
void setWord(string wordparam);
//Function to set left_ptr to the left_ptr
void setLeft(word* left_ptr);
//Function to set right_ptr to the right_ptr
void setRight(word* right_ptr);
//Function to get the pointer to a list containing file nodes.
list* get_list();
//Function to access the specified word.
string getWord();
//Function to access the pointer for the right node in the list.
word* getRight();
//Function to access the pointer for the left node in the list.
word* getLeft();
//Function that returns true or false if this node has a left node.
bool hasLeft();
//Function that returns true or false if this node has a right node.
bool hasRight();
  
int getTotalCount(); //returns total occurences of this word in all files
int getFileCount() ;//returns the number of files containing this word
void printStats();//displays the word statistics
~word(){
  
   if(left_ptr!=NULL)
       delete left_ptr;
   if(right_ptr!=NULL)
       delete right_ptr;
   delete files;
       }
private:
string wordparam;
word* left_ptr;
word* right_ptr;
list* files;
};
#endif

wordsearch.h

#ifndef WORDSEARCH_H
#define WORDSEARCH_H
#include "word.h"
//class that maintains all words in a binarytree so that searching is faster
class binarywordtree
{
   private:
       word* root;
       void inorder(word*)   ;//in-order traversal of node, i.e. in binary search tree, in order gives sorted list in ascending order
   public:
       binarywordtree()
       {
           root=NULL;
       }
       word* getRoot() ;//return the root of the tree
      
       //increments count for the word in the given filename, creates an object if not present in tree
       void addWord(string word,string filename);
      
       word* search(const string searchWord); //return the node in tree for the word being searched, null if not present
       void display(); //walks the tree in in-order and display statistics for each word
       void loadfile(const string filepath); //loads a file given by filepath
       ~binarywordtree(){delete root;} //destructor
          
};
#endif

wordsearch.cpp


#include <iostream>
#include <fstream>
#include "wordsearch.h"
word* binarywordtree::getRoot()
{
   return root;
}

void binarywordtree::addWord(const string str,const string filename)
{
   word* w=search(str);
   if(w!=NULL) //word already in tree just update count in filename
   {
      
       w->get_list()->increment(filename); //update the file by incrementing the count
     
       return;  
      
   }
   else //word not in tree so far, create a new word object
   {
       w=new word(str,NULL,NULL);
       w->get_list()->increment(filename);      
         
       if(root==NULL) //if root is null, this is the first node in the tree.
       {
           root=w;
           return;
       }
          
       //now travel the tree and find a parent to attach this word. In binary search tree
       //all nodes to left are lesser than or equal to parent and all node on right are greater
       //than parent
      
       word *parent=NULL,*current=root;
       bool inserted=false;
       while(current!=NULL)
       {
           parent=current;
           if(str<=current->getWord()) //check if the word to add is less than the currnt word in tree, then go to left subtree
           {
               current=current->getLeft();
               if(current==NULL) //no left subtree, so this word becomes left child
                   parent->setLeft(w);
                      
           }
           else
           {
               current=current->getRight();
               if(current==NULL) //no right child , so this word becomes right child
                   parent->setRight(w);
           }
          
       }
   }
      
}

word* binarywordtree::search (const string str)
{
     
   word* current=root;
  
   while(current!=NULL)
   {
       if(current->getWord()==str)
           return current; //found the word, return immediately
       if(str<=current->getWord())
           current=current->getLeft();
       else
           current=current->getRight();
   }
   return NULL; //did not find word till this point , so return NULL
}

void binarywordtree::display()
{
   inorder(root);
  
}

void binarywordtree::inorder(word* w)
{
   if(w==NULL)
       return;
   inorder(w->getLeft());
   cout<<" "<<w->getWord();
   inorder(w->getRight());
}
void binarywordtree::loadfile(const string filepath)
{
  
   ifstream infile;
   infile.open(filepath.c_str()); //open the file for reading
   if(!infile.is_open()) //check if error occured wwhile opening
   {
       cout<<" Error: Could not open file "<<filepath;
       exit(1);
   }
   cout<<" Loading words from file "<<filepath;
   string str;
   while(!infile.eof()) //till we reach end of file
   {
       infile>>str; //get a word from file
       addWord(str,filepath);
   }
   infile.close();
      
}

int main(int argc,char *argv[])

{
   int noOfFiles=3; //Please dont forget to change this if more files are listed
   string files[]={"c:/test/file1.txt","c:/test/file2.txt","c:/test/file3.txt"}; //this list of file paths to load
   binarywordtree tree;
  
   //load all files
   for(int i=0;i<noOfFiles;i++)
   {
       tree.loadfile(files[i]);
   }
  
   cout<<" Loaded "<<noOfFiles<<" files. ";
   string response;
   string w;
   word* treeword;
   cout<<" Do you want to see words in tree? y/n : ";
   cin>>response;
   if(response=="y")
   {
  
       cout<<" The words in tree in SORTED ascending order ";
       tree.display();
   }
   do
   {
       cout<<" Enter a word to search : ";
       cin>>w;
       treeword=tree.search(w);
       if(treeword==NULL)
       {
           cout<<" Word ""<<w<<"" not found ";
       }
       else
       {
           treeword->printStats();
       }
       cout<<" Do you wish to continue? (y/n) : ";
       cin>>response;
      
      
   }while(response=="y");
     
     
}

Sample files used:

File1.txt

In this long project, you are going to implement a word search engine using C++ and binary tree

File2.txt

The search engine would help users to count number of occurrence of a word among given set of text files and also sort all the words

File3.txt

print the existing words in all the input file in alphabetic order up to the input number search the word

Output for the sample input

Loading words from file c:/test/file1.txt
Loading words from file c:/test/file2.txt
Loading words from file c:/test/file3.txt

Loaded 3 files.


Do you want to see words in tree? y/n : y

The words in tree in SORTED ascending order
C++ In The a all alphabetic also among and are binary count engine existing file files given going help implement in input long number occurrence of order print project, search set sort text the this to tree up users using word words would you
Enter a word to search : the

-----------------
Word: the
Total occurences: 5
Number of files found: 2
Average occurence: 2
File paths where the file occurs - 4 times in c:/test/file3.txt, and 1 time in c:/test/file2.txt.

-----------------
Do you wish to continue? (y/n) : y

Enter a word to search : engine

-----------------
Word: engine
Total occurences: 2
Number of files found: 2
Average occurence: 1
File paths where the file occurs - 1 time in c:/test/file2.txt, and 1 time in c:/test/file1.txt.

-----------------
Do you wish to continue? (y/n) : y

Enter a word to search : hello

Word "hello" not found
Do you wish to continue? (y/n) : y

Enter a word to search : and

-----------------
Word: and
Total occurences: 2
Number of files found: 2
Average occurence: 1
File paths where the file occurs - 1 time in c:/test/file2.txt, and 1 time in c:/test/file1.txt.

-----------------
Do you wish to continue? (y/n) : n

--------------------------------

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote