Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Write a command that outputs a list of all words in the file, one on each line.

ID: 3717225 • Letter: W

Question

Write a command that outputs a list of all words in the file, one on each line. For the purposes of this command, anything that isn't a letter is considered a word delimiter, and words are case-sensitive. Duplicate words are permitted for now. This is done with shell scripting commands using bash.

Recommended programs: sed

Output for tiny.txt (order doesn't matter)
See
Spot
See
Spot
run
Run
Spot
run
Jane
sees
Spot
run

Part 2: Count Words

Modify the previous command to count the number of times each word appears in the file. Sort in descending order of count.

Recommended programs: sed, sort, uniq

Output for tiny.txt:
4 Spot
3 run
2 See
1 sees
1 Run
1 Jane

Extend the previous command to count the number of words that have each count.

E.g. there are three words in tiny.txt that have a count of 1 (Jane, Run, sees). And there's only one word that has a count of 2 (See). So the count of counts for 1 is 3, and the count of counts for 2 is 1.

Recommended programs: sed, awk, sort, uniq

Output for tiny.txt. (first column is count of counts, second column is counts. sorted by the second column):
3 1
1 2
1 3
1 4

This one is just like Part 2, except you should only output the words that appear more than once. You may assume that counts.txt contains the output from part 2.

Recommended programs: sed, awk

Output for tiny.txt:
4 Spot
3 run
2 See

Part 5: Variables

Given a filename as input, output the same name, but without the directory, and
with the suffix changed to .counts.

Recommended programs: sed, basename
Recommended bash commands: (variables and subshells)

Variables in bash are about what you'd expect:

# This is a comment
var='foo42' # sets the variable var equal to 'foo42'. NOTE: you must not have spaces around the =
$var        # gets the value of var

Subshells allow you to run a bash command and return whatever it printed. You can use this to modify the values of variables using external programs.

For example:
    var='foo42'
    var2=$(echo "$var" | tr 'A-Z' 'a-z')
    echo "$var2" # prints FOO42

Example of use for tiny.txt:
$ echo "data/tiny.txt" | your_command
tiny.counts

Part 6: Loops

Loop through every .txt file in the current directory, and run the filename through the command from part 5.

Recommended bash commands: for

Expected Output (order doesn't matter):
chicken.counts
germanium.counts
magic.counts
mel.counts
murphy.counts
passwords.counts
tiny.counts
vim.counts

Part 7: Putting everything together

A common use of shell scripts is quickly preprocessing data into a form expected by another program. That's what you'll be doing here. Your script will create two files for each of the datafiles. These will then be passed off to other programs to create a final pdf.

You will be creating lab10.sh. When run (without any arguments), it must do the following:

Loop through each of the .txt files in the data/ directory.

For tiny.txt, create two files:

counts/tiny.count-counts, containing the results from part 3 (count of counts)

counts/tiny.counts-frequent, containing the results from part 4 (filter infrequent words)

For all the other datafiles, create those same files, but with "tiny" replaced with the appropriate prefix.

We've provided three files in addition to the input data:

make_figures.m:
This is a MATLAB script that will produce graphs of the counts of frequent words, and the count of counts. MATLAB is useful for numeric calculations (as opposed to symbolic/algebraic), and generating visualizations of data.
   
There's also a free clone called octave that's available on most Unix systems. (This script uses some functions that aren't available in octave, though, so it needs actual MATLAB. MATLAB is installed on thor.)

writeup.tex:
This is a LaTex document. LaTex (pronounced Lay-Tech) is used for typesetting documents, and is generally the tool of choice for typesetting technical documents, especially if they include any remotely complicated mathematics.

It will automatically include the counts file created by your script and the graphs created by make_figures.m, and create a pdf with one page for each file in the dataset.

makefile:
The default target will run lab10.sh, write_graphs.m, writeup.tex as needed to create everything from scratch. It assumes that lab10.sh exists and creates all the files in the counts directory listed above.

So if lab10.sh works correctly, you will be able to type make and you will end up with a pdf containing a page for every datafile.

Include the complete lab10.sh in the answer box.

Expected Output for tiny.txt:

See tiny-writeup.pdf from lab10.tgz

Explanation / Answer

I answer First 4 according to chegg policy question using a single shell script with my own created file. Script will ack for a filename and you give any file name.

I pointed every part so you can copy a command and give the file name instead of $f1. Any doubts please comments me.

The file contains

cat a1.txt
We are learning, linux.
The fat cat, fight with other fat cat

Shell script

clear
echo "Enter the file name"
read f1
echo "-----PART 1 List of all words in the file-----"
cat $f1 | sed 's|[,.]||g' | tr ' ' ' '
echo "-----PART 2 Count the number of times each word------"
cat $f1 | sed 's|[,.]||g' | tr ' ' ' '|sort|uniq -c|tr -s " "|cut -d " " -f 2,3
echo "------PART 3 Count of counts--------"
cat a1.txt | sed 's|[,.]||g' | tr ' ' ' '|sort|uniq -c|tr -s " "|cut -d " " -f 2|sort|uniq -c
echo "------PART 4 Display the words more than once-----"
echo "First way by grep"
cat $f1 | sed 's|[,.]||g' | tr ' ' ' '|sort|uniq -c|tr -s " "|cut -d " " -f 2,3|grep -v ^1
echo "2nd way by awk"
cat $f1 | sed 's|[,.]||g' | tr ' ' ' '|sort|uniq -c|sort|tr -s " "|awk '$1 !~ /^1/'

Output

[root@master1 ~]# sh a1.sh
Enter the file name
a1.txt
-----PART 1 List of all words in the file-----
We
are
learning
linux
The
fat
cat
fight
with
other
fat
cat
-----PART 2 Count the number of times each word------
1 are
2 cat
2 fat
1 fight
1 learning
1 linux
1 other
1 The
1 We
1 with
------PART 3 Count of counts--------
8 1
2 2
------PART 4 Display the words more than once-----
First way by grep
2 cat
2 fat
2nd way by awk
2 cat
2 fat

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote