Must be done using fork()! In this project, you will read in a directory name an
ID: 3605452 • Letter: M
Question
Must be done using fork()!
In this project, you will read in a directory name and walk through the directory structure to find all .csv files. There may be multiple levels of directories that you will need to recurse through. You will then fork child processes to sort each of the files and output the results to a different file. You should NOT use exec for this project. You should write one program that, when copied from the parent to the child process, will continue running. You can use the return value of fork() in a conditional to choose different blocks of code to run within your code. You will want to make sure to prevent zombie and orphan child processes. You will also want to make sure you to not create fork bombs that will bring down machines. In all cases of bad input, you should fail gracefully (e.g. no segfaults).
You will output metadata about your processes to STDOUT. This metadata will show the total number of processes created and the pids of all created processes.
Methodology
a. Parameters
Your code will read in a set of parameters via the command line. Records will be stored in .csv files in the provided directory. As mentioned above, directory structures may have multiple levels and you must find all .csv files. Your code should ignore non .csv files and .csv files that do not have the correct format of the movie_metadata csv (e.g. csv files that have other random data in them).
Remember, the first record (line) is the column headings and should not be sorted as data. Your code must take in a command-line parameter to determine which value type (column) to sort on. If that parameter is not present (?-> throw an error, or default behavior). The first argument to your program will be '-c' to indicate sorting by column and the second will be the column name:
./sorter -c food
Be sure to check the arguments are there and that they correspond to a listed value type (column heading) in the CSV.
For this phase you'll extend your flags from one to three. The second parameter to your program will be ‘-d’ indicating the directory the program should search for .csv files. This parameter is optional. The default behavior will search the current directory.
./sorter -c food -d thisdir/thatdir
The third parameter to your program will be ‘-o’ indicating the output directory for the sorted versions of the input file. This parameter is optional. The default behavior will be to output in the same directory as the source file.
./sorter -c movie_title -d thisdir -o thatdir
b. Operation
Your code will be reading in and traversing the entire directory. In order to run your code to test it, you will need to open each CSV and read it for processing:
./sorter -c movie_title -d thisdir -o thatdir
Your code's output will be a series of new CSV files outputted to the file whose name is the name of the CSV file sorted, with "-sorted-<fieldname>" postpended to the name.
e.g: 'movie_metadata.csv' sorted on the field 'movie_title' will result in a file named "movie_metadata-sorted-movie_title.csv".
On each new file in a directory you encounter, you should fork() a child process to do the actual sorting.
On each new directory you encounter, you should fork() a child process to process the directory.
To STDOUT, output the following metadata in the shown order:
Initial PID: XXXXX
PIDS of all child processes: AAA,BBB,CCC,DDD,EEE,FFF, etc
Total number of processes: ZZZZZ
You may assume the total number of files and directories will not exceed 255.
c. Structure
Your code should use Mergesort to do the actual sorting of records. It is a powerful algorithm with an excellent average case.
Explanation / Answer
Merge Sort merges the left- and right-sorted subarrays using two indices i and j to iterate over the left (and right) elements, always copying the smaller of A[i] and A[j] into its proper location result[idx]. There are three cases to consider: The right side is exhausted (j end), in which case the remaining elements are taken from the left side The left side is exhausted (i mid), in which case the remaining elements are taken from from the right side The left and right side have elements; if A[i] < A[j], insert A[i] otherwise insert A[j] Once the for loop completes, result has the merged (and sorted) elements from the original A[start, end). Example 4-10 contains the Python implementation of Merge Sort. Example 4-10. Merge Sort implementation in Python def sort (A): """merge sort A in place.""" copy = list (A) mergesort_array (copy, A, 0, len(A)) def mergesort_array (A, result, start, end): """Mergesort array in memory with given range.""" if end - start < 2: return if end - start == 2: if result[start] > result[start+1]: result[start],result[start+1] = result[start+1],result[start] return mid = (end + start) // 2 mergesort_array (result, A, start, mid) mergesort_array (result, A, mid, end) # merge A left- and right- side i = start j = mid idx = start while idx < end: if j >= end or (i < mid and A[i] < A[j]): result[idx] = A[i] i += 1 else: result[idx] = A[j] j += 1 idx += 1 Analysis Merge Sort completes the “merge” phase in O(n) time after recursively sorting the left- and right-half of the range A[start, end), placing the properly ordered elements in the array referenced as result. Because copy is a true copy of the entire array A, the terminating base cases of the recursion will work because it references the original elements of the array directly at their respective index locations. This observation is a sophisticated one and is key to the algorithm. In addition, the final merge step requires only O(n) operations, which ensures the total performance remains O(n log n). Because copy is the only extra space used by the algorithm, the total space requirement is O(n). Variations Of all the sorting algorithms, Merge Sort is the easiest one to convert to working with external data. Example 4-11 contains a full Java implementation using memory mapping of data to efficiently sort a file containing binary-encoded integers. This sorting algorithm requires the elements to all have the same size, so it can’t easily be adapted to sort arbitrary strings or other variable-length elements. Example 4-11. External Merge Sort implementation in Java public static void mergesort (File A) throws IOException { File copy = File.createTempFile ("Mergesort", ".bin"); copyFile(A, copy); RandomAccessFile src = new RandomAccessFile (A, "rw"); RandomAccessFile dest = new RandomAccessFile (copy, "rw"); FileChannel srcC = src.getChannel(); FileChannel destC = dest.getChannel(); MappedByteBuffer srcMap = srcC.map (FileChannel.MapMode.READ_WRITE, 0, src.length()); MappedByteBuffer destMap = destC.map (FileChannel.MapMode.READ_WRITE, 0, dest.length()); mergesort (destMap, srcMap, 0, (int) A.length()); // The following two invocations are only needed on Windows platform: closeDirectBuffer (srcMap); closeDirectBuffer (destMap); src.close(); dest.close(); copy.deleteOnExit(); } static void mergesort (MappedByteBuffer A, MappedByteBuffer result, int start, int end) throws IOException { if (end - start < 8) { return; } if (end - start == 8) { result.position (start); int left = result.getInt(); int right = result.getInt(); if (left > right) { result.position (start); result.putInt (right); result.putInt (left); } return; } int mid = (end + start)/8*4; mergesort (result, A, start, mid); mergesort (result, A, mid, end); result.position (start); for (int i = start, j = mid, idx=start; idx < end; idx += 4) { int Ai = A.getInt (i); int Aj = 0; if (j < end) { Aj = A.getInt (j); } if (j >= end || (i < mid && Ai < Aj)) { result.putInt (Ai); i += 4; } else { result.putInt (Aj); j += 4; } } }
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.