Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

We have root n machines, each having 10 root n memory space. The input is a set

ID: 3758918 • Letter: W

Question

We have root n machines, each having 10 root n memory space. The input is a set of items (i; vi), for i = 1; 2; : : : ; n, distributed in an arbitrary order among the root n machines. We want to compute the output (i; P ji vj ), for i = 1; : : : ; n. The n output pairs can also be distributed in an arbitrary order on the root n machines. Please design a MapReduce algorithm to solve this problem in constant number of rounds (the fewer the better). Please specify the followings at each round: The Map function in the format of (key; value) ! (key1; value1); (key2; value2); : : : The Reduce function in the format of (key; value1); (key; value2); : : : ! (key1; value1); (key2; value2); : : :

Explanation / Answer

The MapReduce framework automatically sorts the keys generated by mappers. This means that, before starting reducers, all intermediate key-value pairs generated by mappers must be sorted by key (and not by value). Values passed to each reducer are not sorted at all; they can be in any order. What if you also want to sort a reducer’s values? MapReduce/Hadoop and Spark do not sort values for a reducer. So, for those applications (such as time series data) in which you want to sort your reducer data, the Secondary Sort design pattern enables you to do so.

First we’ll focus on the MapReduce/Hadoop solution. Let’s look at the MapReduce paradigm and then unpack the concept of the secondary sort:

(K, V1), (K, V2), ..., (K, Vn)

then all these values {V1, V2, ..., Vn} will be processed by a single reducer (for key = K), but

there will be no order (ascending or descending) between instances of Vi., Secondary Sort is a design pattern we can use to apply an order (such as “ascending sort” or “descending sort”) to the values. How do we accomplish this? Say we want to apply some order to the reducer values:

package org.myorg;

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class WordCount {

   public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

     private final static IntWritable IntWritable(1);

     private Text word = new Text();

     public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

       String line = value.toString();

       StringTokenizer tokenizer = new StringTokenizer(line);

       while (tokenizer.hasMoreTokens()) {

         word.set(tokenizer.nextToken());

         output.collect(word, one);

       }

     }

   }

   public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

     public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

       int sum = 0;

       while (values.hasNext()) {

         sum += values.next().get();

       }

       output.collect(key, new IntWritable(sum));

     }

   }

   public static void main(String[] args) throws Exception {

     JobConf conf = new JobConf(WordCount.class);

     conf.setJobName("wordcount");

     conf.setOutputKeyClass(Text.class);

     conf.setOutputValueClass(IntWritable.class);

     conf.setMapperClass(Map.class);

     conf.setCombinerClass(Reduce.class);

     conf.setReducerClass(Reduce.class);

     conf.setInputFormat(TextInputFormat.class);

     conf.setOutputFormat(TextOutputFormat.class);

     FileInputFormat.setInputPaths(conf, new Path(args[0]));

     FileOutputFormat.setOutputPath(conf, new Path(args[1]));

     JobClient.runJob(conf);

   }

}

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote