Write a My Grep utility based on C. This MyGrep utility is similar to grep utili
ID: 3579590 • Letter: W
Question
Write a MyGrep utility based on C. This MyGrep utility is similar to grep utility provided by Unix. MyGrep utility takes some options, words and text file as arguments.
•$ MyGrep –c “This is a list of words” test.txt
Count the occurrences of string “This is a list of words” in the content of file test.txt
•$ MyGrep –c -i “This is a list of words” test.txt
Count the occurrences of string “This is a list of words” in the content of file test.txt and ignore cases
•$ MyGrep –o “This is a list of words” test.txt
Output all lines containing “This is a list of words” and highlight the matched
string
•$ MyGrep –s test.txt
Remove all leading spaces in each line and output the result.
•$ MyGrep –n test.txt
In the output, add a line number at the beginning of each line.
• $MyGrep –n “This is a list of words” test.txt
Output the lines containing string “This is a list of words” and add a line number at the beginning of each matched line.
•$ MyGrep –k “This” “is” “a” “list” “of” “words” test.txt
Count the occurrences for each word and output the words along with their occurrences in a decreasing order.
For example, the output could be
a 20
is 8
list 5
words 3
of 2
this 1
Explanation / Answer
#!/bin/bash
# wf.sh: Crude word frequency analysis on a text file.
# This is a more efficient version of the "wf2.sh" script.
# Check for input file on command-line.
ARGS=1
E_BADARGS=45
E_NOFILE=66
if [ $# -ne "$ARGS" ] # Correct number of arguments passed to script?
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
if [ ! -f "$1" ] # Check if file exists.
then
echo "File "$1" does not exist."
exit $E_NOFILE
fi
# main ()
sed -e 's/.//g' -e 's/,//g' -e 's/ /
/g' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr
#
# Frequency of occurrence
# Filter out periods and commas, and
#+ change space between words to linefeed,
#+ then shift characters to lowercase, and
#+ finally prefix occurrence count and sort numerically.
# Arun Giridhar suggests modifying the above to:
# . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr
# This adds a secondary sort key, so instances of
#+ equal occurrence are sorted alphabetically.
# As he explains it:
# "This is effectively a radix sort, first on the
#+ least significant column
#+ (word or string, optionally case-insensitive)
#+ and last on the most significant column (frequency)."
#
# As Frank Wang explains, the above is equivalent to
#+ . . . | sort | uniq -c | sort +0 -nr
#+ and the following also works:
#+ . . . | sort | uniq -c | sort -k1nr -k
exit 0
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.