Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1 Marks Question Four Supposeyou are to develop an indexing application with the

ID: 3715378 • Letter: 1

Question

1 Marks Question Four Supposeyou are to develop an indexing application with the following conditions: 1- Input: A document (i.e., a web page) with a known number of words 2- Output: An index corresponding to the words of the document and their frequencies. An example would be the portionin the figure below, which is extracted from the SEU main page. The following is the IDs for the words mentioned in the above portion: 28-05-1439 (ACTFL) experts visit the Saudi Electronic University to review and develop the Arabic Online Program and the Standardized Arabic 14 Experts.. Arabic Standardized 2 The outputindex consists of two integers separated by a colon. The index for the word "Arabic" is 11:2. The first number before the colon represents the word's ID, and the second number is the word's frequency (means, telling how many times the word has occurred in the document) Which data structure would you select for your solution? Give 2 reasons to justify your answer?

Explanation / Answer

I would choose Hash Tables for this indexing implementation.

In Hash Tables we can map keys to values.And we use a hash function to compute an index and based on that value we can insert in the hash table.Here we can hash all the words in the hash table and if the word is already present we can increment its count and keep track of the frequency of the word.So that we can get both index value and frequecny of the word.

If collisions occur we have different kinds of hashing to improve the throughput of the application and reduce the time as well

HashTable datastructure is fast and accurate.we can use a hash function with less no of collisions or even a perfect hash function without collisions.And even the data is large hashing can be used efficiently.

Hope this will help