Write Scala code on Spark’s community edition for solving the following problems
ID: 3744811 • Letter: W
Question
Write Scala code on Spark’s community edition for solving the following problems. Use the hints given with each problem to guide your solution development.
P.S. If you are not comfortable with writing Scala code, you can use Python or Java. However, I strongly recommend that you at least first try to write the code in Scala.
(10 points) Write a function that computes the Simple Matching Coefficient Similarity metric.
The function should take two arrays or two vectors that hold binary values. It computes the similarity and returns the results.
(10 points) Write a function that computes the Jaccard Coefficient Similarity metric.
Similar to the previous problem, the function should take two arrays or two vectors that hold binary values. It computes the similarity and returns the results.
(30 points) Write a piece of code that reads a data file of the form:
<User_ID> , <movie_1> , < movie_2> , < movie_3>
then computes the pair-wise similarities between each two data points (users). Pairwise similarities mean that you will compute the similarity between each two users in the dataset. If you have 4 users, then you should compute (4*4=16) similarities. If you have 6 users, then you will compute (6*6=36) similarities. You can ignore the fact that half of the compute similarities are duplicates.
The movies are represented using binary values, with 1 indicating that the user has watched the movie, and 0 indicating that the user has not watched the movie. Use the file attached to this assignment to test your code.
******Please provide screen shots of results from code*******
Explanation / Answer
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Scanner;
import javax.swing.plaf.metal.MetalIconFactory.FileIcon16;
public class Intellij {
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("Enter the File Name");
Scanner in = new Scanner(System.in);
String fileName = in.nextLine();
pairWiseSimilarities(fileName);
}
static void simpleMatchingCoefficient(int [] arr1, int [] arr2){
double simpleMatchingCoefficient =0;
int siz1, siz2, similarities = 0 ;
siz1 = arr1.length;
siz2 = arr2.length;
for(int i=0; i<siz1; i++)
for(int j=0; j<siz2; j++)
if(arr1[i] == arr1[j]) similarities++;
simpleMatchingCoefficient = (similarities/(siz1+siz2));
System.out.println("Simple Matching Coefficient : "+ simpleMatchingCoefficient);
}
static void jaccardMatchingCoefficient(int [] arr1, int [] arr2){
double jaccardMatchingCoefficient =0;
int siz1, siz2, similarities = 0 ;
siz1 = arr1.length;
siz2 = arr2.length;
for(int i=0; i<siz1; i++)
for(int j=0; j<siz2; j++)
if(arr1[i] == arr1[j]) similarities++;
jaccardMatchingCoefficient = (similarities/(siz1+siz2-similarities));
System.out.println("Simple Matching Coefficient : "+ jaccardMatchingCoefficient);
}
static void pairWiseSimilarities(String fileName){
try{
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line = "";
ArrayList<int[]> pairsList = new ArrayList<>();
while ((line = bufferedReader.readLine()) != null){
String [] temp = line.split(" ");
// assuming eash pair is separated by a space
//suppose 123 1 0 0 1 = userid movie1 movie2 movie3 movie4
int temp1[] = {Integer.parseInt(temp[0]),Integer.parseInt(temp[1]),
Integer.parseInt(temp[2]),Integer.parseInt(temp[3])};
pairsList.add(temp1);
}
for(int i = 0; i<pairsList.size();i++){
for(int j=0;j<pairsList.size();j++){
int arr1[] = Arrays.copyOfRange(pairsList.get(i), 1, 4);
int arr2[] = Arrays.copyOfRange(pairsList.get(j), 1, 4);
System.out.println("User Id : "+pairsList.get(i)[0]+" vs UserId : "+pairsList.get(j)[0]);
simpleMatchingCoefficient(arr1, arr2);
jaccardMatchingCoefficient(arr1, arr2);
}
}
}catch(Exception e){
e.printStackTrace();
}
}
}
Please feel free to comment below if you need further assistance or you have any doubts regarding my answer
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.