Problem 5 . Hadoop and Spark are both share-nothing paradigms. But they do suppo

ID: 3712224 • Letter: P

Question

Problem 5. Hadoop and Spark are both share-nothing paradigms. But they do support sharing immutable data structures among all workers (computer nodes) in the computer cluster. Please enumerate two techniques to achieve this goal and fill them in the following table.

Hadoop

Spark

In Spark, Rob needs to create an accumulator with initial value “3.14” of double type. Please tell him how to do that:

Answer:

Rob wants to increase the value of this accumulator by “1.1” of double type in each executor. Please him how to do that.

Answer: In the source code for each executor, add the following two lines:

When Rob implements the join algorithm for joining the phone book and country code lookup table, he needs that each executor can access the code lookup table. Suppose that the lookup table is stored in an RDD named “LookupTable1”, please tell Rob how to replicate this RDD to all the executors in the driver (master) program.

Answer:

How to get the broadcasted data in each executor?

Answer:

For each executor, the Accumulators are -only variables; the broadcast variables are -only variables. Please choose from “read” or “write” when you fill in the previous two blanks.

Hadoop

Spark

Explanation / Answer

1. Spark Resilient distributed database is collection of immutables objects that computes on different nodes of the cluster. There are three ways to do this in spark

1. Data in Stable Storage

2. using other RDDs(Resilient Distributed Database)

3. Parallelizing already existing collection

Hadoop Spark 1. Hadoop provides a method called MapReduce API which can be used to provide core functionality for data store in Hadoop.

1. Spark Resilient distributed database is collection of immutables objects that computes on different nodes of the cluster. There are three ways to do this in spark

1. Data in Stable Storage

2. using other RDDs(Resilient Distributed Database)

3. Parallelizing already existing collection

With Hadoop, all records written are immutable because Hadoop random writes are not supported in hadoop. This can be a disadvantage, but it scales really well. Some more languages are bringing back this concept of immutable objects Spark RDDs also can get cached and partitioned

Navigate

Problem 5 - Use the data below to calculate two forecasts: 1) A 4 period Moving

Problem 5 . The following probabilistic activity time estimates are for a CPM/PE

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Problem 5 . Hadoop and Spark are both share-nothing paradigms. But they do suppo

Question

Explanation / Answer

Related Questions

Navigate