Short paragraph answers: Part 1: What is the purpose of the open and close opera
ID: 3760416 • Letter: S
Question
Short paragraph answers:
Part 1: What is the purpose of the open and close operations for a file that is to be read from disk? For a file that is to be created? Part 2: Explain why moving a file from one disk to another requires copying all data in the file, while moving it to a different directory on the same disk does not. What does happen when a file is moved from one directory to another on the same disk? Part 3: A file has filled its initial allocation on disk, and more data must be written. If the organization of the file system is discontiguous and linked, what must happen to allow more data to be written? Part 4: Given the situation described in part 3, if the allocation strategy is contiguous allocation and there is no space adjacent to the existing file, what must happen before additional data can be written? Part 5: Explain in detail what is meant by the working set of a process.
Explanation / Answer
Part 1: What is the purpose of the open and close operations for a file that is to be read from disk? For a file that is to be created? The open function creates and returns a new file descriptor for the file named by filename. Initially, the file position indicator for the file is at the beginning of the file. The argument mode (see Permission Bits) is used only when a file is created, but it doesn’t hurt to supply the argument in any case. The flags argument controls how the file is to be opened. This is a bit mask; you create the value by the bitwise OR of the appropriate parameters (using the ‘|’ operator in C). See File Status Flags, for the parameters available. The normal return value from open is a non-negative integer file descriptor. In the case of an error, a value of -1 is returned instead. In addition to the usual file name errors (see File Name Errors), the following errno error conditions are defined for this function: The function close closes the file descriptor filedes. Closing a file has the following consequences: • The file descriptor is deallocated. • Any record locks owned by the process on the file are unlocked. • When all file descriptors associated with a pipe or FIFO have been closed, any unread data is discarded. This function is a cancellation point in multi-threaded programs. This is a problem if the thread allocates some resources (like memory, file descriptors, semaphores or whatever) at the time close is called. If the thread gets canceled these resources stay allocated until the program ends. To avoid this, calls to close should be protected using cancellation handlers. Reading File Data The most basic file I/O task is to read the contents of a file. You obtain a stream from which you can read a file's contents with the OPEN function. By default OPEN returns a character-based input stream you can pass to a variety of functions that read one or more characters of text: READ-CHAR reads a single character; READ-LINE reads a line of text, returning it as a string with the end-of-line character(s) removed; and READ reads a single s-expression, returning a Lisp object. When you're done with the stream, you can close it with the CLOSE function. The only required argument to OPEN is the name of the file to read. As you'll see in the section "Filenames," Common Lisp provides a couple of ways to represent a filename, but the simplest is to use a string containing the name in the local file-naming syntax. So assuming that /some/file/name.txt is a file, you can open it like this: (open "/some/file/name.txt") You can use the object returned as the first argument to any of the read functions. For instance, to print the first line of the file, you can combine OPEN, READ-LINE, and CLOSE as follows: (let ((in (open "/some/file/name.txt"))) (format t "~a~%" (read-line in)) (close in)) Of course, a number of things can go wrong while trying to open and read from a file. The file may not exist. Or you may unexpectedly hit the end of the file while reading. By default OPEN and the READ-* functions will signal an error in these situations. In Chapter 19, I'll discuss how to recover from such errors. For now, however, there's a lighter-weight solution: each of these functions accepts arguments that modify its behavior in these exceptional situations. If you want to open a possibly nonexistent file without OPEN signaling an error, you can use the keyword argument :if-does-not-exist to specify a different behavior. The three possible values are :error, the default; :create, which tells it to go ahead and create the file and then proceed as if it had already existed; and NIL, which tells it to return NIL instead of a stream. Thus, you can change the previous example to deal with the possibility that the file may not exist. (let ((in (open "/some/file/name.txt" :if-does-not-exist nil))) (when in (format t "~a~%" (read-line in)) (close in))) The reading functions--READ-CHAR, READ-LINE, and READ--all take an optional argument, which defaults to true, that specifies whether they should signal an error if they're called at the end of the file. If that argument is NIL, they instead return the value of their third argument, which defaults to NIL. Thus, you could print all the lines in a file like this: (let ((in (open "/some/file/name.txt" :if-does-not-exist nil))) (when in (loop for line = (read-line in nil) while line do (format t "~a~%" line)) (close in)))
Part 2: Explain why moving a file from one disk to another requires copying all data in the file, while moving it to a different directory on the same disk does not. What does happen when a file is moved from one directory to another on the same disk? Because the actual file is a set of code located on a disk drive, which means that if You have 2 drives and remove one, the file has to be reachable on the drive You keep installed, which wouldn;t work without copying all the data of the file to the right drive. While the file itself would remain on the same drive if it was moved to another folder or location on that same drive, the only necessary changes to keep finding the file while the drive is installed is to update the (F)ile (A)llocation (T)ables that tell where the file is on the disk so that when You click the new shortcut or icon it links properly to the file from where You started. When you are moving files within the same volume you are traditionally rearranging your file system. Altering the file permissions at the level of the directory could lock you out of that file the moment the move operation is finished. This is undesirable if, for instance, you just accidentally moved a file to a system, or a folder with special ownership permissions or otherwise protected. There would be no way of correcting the mistake other than taking ownership of the file (if you have the privileges), or logging with a privileged account. Considering normal day-to-day operation of a computer, you could find you had no control over your filesystem. This behavior is common among most (if not all) operating systems making use of ACL. It guarantees normal filesystem operations within a volume by users and applications alike. Conversely, when moving files between volumes you are traditionally giving a file away for control by something or someone else. It makes sense, as you well realize, for the file to then incorporate the target folder permissions, which will give the target the necessary permissions to then rearrange its own filesystem as they see fit. Naturally this isn't always desirable. For which reason move and copy operations can be defined with special permission inheritance rules. From the same article: • To preserve permissions when files and folders are copied or moved, use the Xcopy.exe utility with the /O or the /X switch. The object's original permissions will be added to inheritable permissions in the new location. • To add an object's original permissions to inheritable permissions when you copy or move an object, use the Xcopy.exe utility with the –O and –X switches.
Part 3: A file has filled its initial allocation on disk, and more data must be written. If the organization of the file system is discontiguous and linked, what must happen to allow more data to be written?
Part 4: Given the situation described in part 3, if the allocation strategy is contiguous allocation and there is no space adjacent to the existing file, what must happen before additional data can be written? The File Allocation Table, FAT, used by DOS is a variation of linked allocation, where all the links are stored in a separate table at the beginning of the disk. The benefit of this approach is that the FAT table can be cached in memory, greatly improving random access speeds. An extent is a contiguous area of storage reserved for a file in a file system, represented as a range. A file can consist of zero or more extents; one file fragment requires one extent. The direct benefit is in storing each range compactly as two numbers, instead of canonically storing every block number in the range. To the extent that fragmentation can be avoided, extent based file systems can eliminate most of the metadata overhead of large files that would traditionally be taken up by the block allocation tree. Because the savings are small compared to the stored data (generally for all file sizes), but makes up a large portion of the metadata (for large files), the benefits in storage efficiency and performance are slight, whereas the reduction in metadata is significant and reduces exposure to file system corruption [citation needed] — one bad sector in the block allocation tree causes much greater data loss than one bad data sector. In order to resist fragmentation, several extent based file systems do allocate-on-flush. Many modern fault tolerant file systems also do copy-on-write, although that increases fragmentation. • Contiguous Allocation requires that all blocks of a file be kept together contiguously. • Performance is very fast, because reading successive blocks of the same file generally requires no movement of the disk heads, or at most one small step to the next adjacent cylinder. • Storage allocation involves the same issues discussed earlier for the allocation of contiguous blocks of memory ( first fit, best fit, fragmentation problems, etc. ) The distinction is that the high time penalty required for moving the disk heads from spot to spot may now justify the benefits of keeping files contiguously when possible. • ( Even file systems that do not by default store files contiguously can benefit from certain utilities that compact the disk and make all files contiguous in the process. ) • Problems can arise when files grow, or if the exact size of a file is unknown at creation time: o Over-estimation of the file's final size increases external fragmentation and wastes disk space. o Under-estimation may require that a file be moved or a process aborted if the file grows beyond its originally allocated space. o If a file grows slowly over a long time period and the total final space must be allocated initially, then a lot of space becomes unusable before the file fills the space. • A variation is to allocate file space in large contiguous chunks, called extents. When a file outgrows its original extent, then an additional one is allocated. ( For example an extent may be the size of a complete track or even cylinder, aligned on an appropriate track or cylinder boundary. ) The high-performance files system Veritas uses extents to optimize performance.
part 5: Explain in detail what is meant by the working set of a process. The "working set" is short hand for "parts of memory that the current algorithm is using" and is determined by which parts of memory the CPU just happens to access. It is totally automatic to you. If you are processing an array and storing the results in a table, the array and the table are your working set. This is discussed because the CPU will automatically store accessed memory in cache, close to the processor. The working set is a nice way to describe the memory you want stored. If it is small enough, it can all fit in the cache and your algorithm will run very fast. On the OS level, the kernel has to tell the CPU where to find the physical memory your application is using (resolving virtual addresses) every time you access a new page (typically 4k in size) so also you want to avoid that hit as much as possible. See What Every Programmer Should Know About Memory - PDF for graphs of algorithm performance vs size of working set (around page 23) and lots of other interesting info. Basically - write your code to access the smallest amount of memory possible (i.e classes are small, not too many of them), and try to ensure tight loops run on a very very small subset of that memory. Definition Peter Denning (1968) defines “the working set of information of a process at time to be the collection of information referenced by the process during the process time interval ”. Typically the units of information in question are considered to be memory pages. This is suggested to be an approximation of the set of pages that the process will access in the future (say during the next time units), and more specifically is suggested to be an indication of what pages ought to be kept in main memory to allow most progress to be made in the execution of that process. Rationale The effect of choice of what pages to be kept in main memory (as distinct from being paged out to auxiliary storage) is important: if too many pages of a process are kept in main memory, then fewer other processes can be ready at any one time. If too few pages of a process are kept in main memory, then the page fault frequency is greatly increased and the number of active (non-suspended) processes currently executing in the system approaches zero. The working set model states that a process can be in RAM if and only if all of the pages that it is currently using (often approximated by the most recently used pages) can be in RAM. The model is an all or nothing model, meaning if the pages it needs to use increases, and there is no room in RAM, the process is swapped out of memory to free the memory for other processes to use. Often a heavily loaded computer has so many processes queued up that, if all the processes were allowed to run for one scheduling time slice, they would refer to more pages than there is RAM, causing the computer to "thrash". By swapping some processes from memory, the result is that processes—even processes that were temporarily removed from memory—finish much sooner than they would if the computer attempted to run them all at once. The processes also finish much sooner than they would if the computer only ran one process at a time to completion, since it allows other processes to run and make progress during times that one process is waiting on the hard drive or some other global resource. In other words, the working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. Thus it optimizes CPU utilization and throughput. Implementation The main hurdle in implementing the working set model is keeping track of the working set. The working set window is a moving window. At each memory reference a new reference appears at one end and the oldest reference drops off the other end. A page is in the working set if it is referenced in the working set window. To avoid the overhead of keeping a list of the last k referenced pages, the working set is often implemented by keeping track of the time t of the last reference, and considering the working set to be all pages referenced within a certain period of time. The working set isn't a page replacement algorithm, but page-replacement algorithms can be designed to only remove pages that aren't in the working set for a particular process. One example is a modified version of the clock algorithm called WSClock. Variants Working set can be divided into code working set and data working set. This distinction is important when code and data are separate at the relevant level of the memory hierarchy, as if either working set does not fit in that level of the hierarchy, thrashing will occur. In addition to the code and data themselves, on systems with virtual memory, the memory map (of virtual memory to physical memory) entries of the pages of the working set must be cached in the translation lookaside buffer (TLB) for the process to progress efficiently. This distinction exists because code and data are cached in small blocks (cache lines), not entire pages, but address lookup is done at the page level. Thus even if the code and data working sets fit into cache, if the working sets are split across many pages, the virtual address working set may not fit into TLB, causing TLB thrashing. Analogs of working set exist for other limited resources, most significantly processes. If a set of processes requires frequent interaction between multiple processes, then it has a process working set that must be coscheduled in order to progress:[1] parallel programs have a process working set that must be coscheduled (scheduled for execution simultaneously) for the parallel program to make progress. If the processes are not scheduled simultaneously – for example, if there are two processes but only one core on which to execute them – then the processes can only advance at the rate of one interaction per time slice. Other resources include file handles or network sockets – for example, copying one file to another is most simply done with two file handles: one for input, one for output, and thus has a "file handle working set" size of two. If only one file handle is available, copying can still be done, but requires acquiring a file handle for the input, reading from it (say into a buffer), releasing it, then acquiring a file handle for the output, writing to it, releasing it, then acquiring the input file handle again and repeating. Similarly a server may require many sockets, and if it is limited would need to repeatedly release and re-acquire sockets. Rather than thrashing, these resources are typically required for the program, and if it cannot acquire enough resources, it simply fails.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.