n) Describe how you would implement a MapReduce job consisting of Map and Reduce
ID: 3054855 • Letter: N
Question
n) Describe how you would implement a MapReduce job consisting of Map and Reduce description. You do not have to write code or even pseudo-code. Just describe, in your own words, what the Map and Reduce tasks are going to do. Map task reads the input file and produces (key, value) pairs. Reduce task takes a list of (key, value) pairs for each key and combines all values for each key.
Please remember that Map operates on individual blocks and Reduce on individual keys with a set of values. Thus, for Mapper you need to state what your code does given a block of data (i.e., for each block, not for the whole file) and for Reduce you need to state what your reducer does for each key (without being able to see other keys).
Find the smallest number in the input file.
For an input data file that contains records (ID, First, Last, Grade) for each student, find how many unique student Last names are there for each Grade (SELECT Grade, COUNT(DISTINCT Last) FROM Student GROUP BY Grade).
For a data file that contains records (ID, First, Last, Grade) for each student, find the GPA (grade point average) of each student.
Explanation / Answer
MapReduce is a framework for parallel computing. Programmers get a simple API and do not have to deal with issues of parallelization, remote execution, data distribution, load balancing, or fault tolerance. The framework makes it easy for one to use thousands of processors to process huge amounts of data (e.g., terabytes and petabytes).
From a user's perspective, there are two basic operations in MapReduce: Map and Reduce.
The Map function reads a stream of data and parses it into intermediate (key, value) pairs. When that is complete, the Reduce function is called once for each unique key that was generated by Mapand is given the key and a list of all values that were generated for that key as a parameter. The keys are presented in sorted order.
As an example of using MapReduce, consider the task of counting the number of occurrences of each word in a large collection of documents. The user-written Map function reads the document data and parses out the words. For each word, it writes the (key, value) pair of (word, 1). That is, the word is treated as the key and the associated value of 1 means that we saw the word once. This intermediate data is then sorted by MapReduce by keys and the user's Reduce function is called for each unique key. Since the only values are the count of 1, Reduce is called with a list of a "1" for each occurence of the word that was parsed from the document.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.