You will create a program that determines the language of a given input file bas
ID: 3532880 • Letter: Y
Question
You will create a program that determines the language of a given input file based on the words in that file. The structure of the input files will be one word per line (all lowercase letters). You should read these words in one LETTER at a time.
Each input file will contain 100000 letters, and be in a different language.
The way you are able to tell the languages apart is as follows:
English: The top letter frequencies are e, i, and a respectively.
Danish: The top letter frequencies are e, r, and n respectively.
Italian: The letters j, x, y do not exist in Italian (other than proper nouns, which are not present in the input files).
Your program MUST include 5 functions in addition to your main() function:
temp_char = a[i]; // array a is the array of size 100000, with all letters
b[temp_char - 'a']++; // b is the array of size 26
Explanation / Answer
The magic tests are used to check for files with data in particular fixed formats. The canonical example of this is a binary executable (compiled program) a.out file, whose format is defined in#include <a.out.h>
and possibly#include <exec.h>
in the standard include directory. These files have a 'magic number' stored in a particular place near the beginning of the file that tells the UNIX operating system that the file is a binary executable, and which of several types thereof. The concept of a 'magic' has been applied by extension to data files. Any file with some invariant identifier at a small fixed offset into the file can usually be described in this way. The information identifying these files is read from the compiled magic file/usr/share/misc/magic.mgc, or the files in the directory/usr/share/misc/magicif the compiled file does not exist. In addition, if$HOME/.magic.mgcor$HOME/.magicexists, it will be used in preference to the system magic files.
If a file does not match any of the entries in the magic file, it is examined to see if it seems to be a text file. ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets (such as those used on Macintosh and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC character sets can be distinguished by the different ranges and sequences of bytes that constitute printable text in each set. If a file passes any of these tests, its character set is reported. ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified as 'text' because they will be mostly readable on nearly any terminal; UTF-16 and EBCDIC are only 'character data' because, while they contain text, it is text that will require translation before it can be read. In addition,filewill attempt to determine other characteristics of text-type files. If the lines of a file are terminated by CR, CRLF, or NEL, instead of the Unix-standard LF, this will be reported. Files that contain embedded escape sequences or overstriking will also be identified.
Oncefilehas determined the character set used in a text-type file, it will attempt to determine in what language the file is written. The language tests look for particular strings (cf.#include <names.h>
) that can appear anywhere in the first few blocks of a file. For example, the keyword.brindicates that the file is most likely atroff(1) input file, just as the keywordstructindicates a C program. These tests are less reliable than the previous two groups, so they are performed last. The language test routines also test for some miscellany (such astar(1) archives).
Any file that cannot be identified as having been written in any of the character sets listed above is simply said to be 'data'.
-C,--compile
Write amagic.mgcoutput file that contains a pre-parsed version of the magic file or directory.
-c,--checking-printout
Cause a checking printout of the parsed form of the magic file. This is usually used in conjunction with the-mflag to debug a new magic file before installing it.
-e,--excludetestname
Exclude the test named intestnamefrom the list of tests made to determine the file type. Valid test names are:
apptype
EMX application type (only on EMX).
text' Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the 'encoding' option).
encoding
Different text encodings for soft magic tests.
tokens' Looks for known tokens inside text files.
cdf' Prints details of Compound Document Files.
compress
Checks for, and looks inside, compressed files.
elf' Prints ELF file details.
soft' Consults magic files.
tar' Examines tar files.
-F,--separatorseparator
Use the specified string as the separator between the filename and the file result returned. Defaults to ':'.
-f,--files-fromnamefile
Read the names of the files to be examined fromnamefile(one per line) before the argument list. Eithernamefileor at least one filename argument must be present; to test the standard input, use '-' as a filename argument.
-h,--no-dereference
option causes symlinks not to be followed (on systems that support symbolic links). This is the default if the environment variable POSIXLY_CORRECT is not defined.
-i,--mime
Causes the file command to output mime type strings rather than the more traditional human readable ones. Thus it may say 'text/plain; charset=us-ascii' rather than 'ASCII text'. In order for this option to work, file changes the way it handles files recognized by the command itself (such as many of the text file types, directories etc), and makes use of an alternative 'magic' file. (See the FILES section, below).
--mime-type,--mime-encoding
Like-i, but print only the specifiedelement(s).
-k,--keep-going
Don't stop at the first match, keep going. Subsequent matches will be have the string '
- ' prepended. (If you want a newline, see the '-r' option.)
-L,--dereference
option causes symlinks to be followed, as the like-named option inls(1) (on systems that support symbolic links). This is the default if the environment variable POSIXLY_CORRECT is defined.
-m,--magic-filemagicfiles
Specify an alternate list of files and directories containing magic. This can be a single item, or a colon-separated list. If a compiled magic file is found alongside a file or directory, it will be used instead.
-N,--no-pad
Don't pad filenames so that they align in the output.
-n,--no-buffer
Force stdout to be flushed after checking each file. This is only useful if checking a list of files. It is intended to be used by programs that want filetype output from a pipe.
-p,--preserve-date
On systems that supportutime(2) orutimes(2), attempt to preserve the access time of files analyzed, to pretend thatfilenever read them.
-r,--raw
Don't translate unprintable characters to ooo. Normallyfiletranslates unprintable characters to their octal representation.
-s,--special-files
Normally,fileonly attempts to read and determine the type of argument files whichstat(2) reports are ordinary files. This prevents problems, because reading special files may have peculiar consequences. Specifying the-soption causesfileto also read argument files which are block or character special files. This is useful for determining the filesystem types of the data in raw disk partitions, which are block special files. This option also causesfileto disregard the file size as reported bystat(2) since on some systems it reports a zero size for raw disk partitions.
-v,--version
Print the version of the program and exit.
-z,--uncompress
Try to look inside compressed files.
-0,--print0
Output a null character '' after the end of the filename. Nice tocut(1) the output. This does not affect the separator which is still printed.
--help
Print a help message and exit.
This program is believed to exceed the System V Interface Definition of FILE(CMD), as near as one can determine from the vague language contained therein. Its behavior is mostly compatible with the System V program of the same name. This version knows more magic, however, so it will produce different (albeit more accurate) output in many cases.
The one significant difference between this version and System V is that this version treats any white space as a delimiter, so that spaces in pattern strings must be escaped. For example,
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.