4. (20 points) Consider the following sample document collection: D1 = (2, 4, 1,
ID: 3903999 • Letter: 4
Question
4. (20 points) Consider the following sample document collection: D1 = (2, 4, 1, 9, 2, 0) D2 = (1, 1, 2, 1, 0,4) D3 = (7, 2, 5, 0, 1, 0) D4 = (0, 1, 2, 6, 1, 2) D5 =(3, 0, 1, 4, 2, 1) D6 = (1,6, 0, 2, 6, 2) D7 = (2, 6, 3, 2, 8, 1) 1.) Using the following similarity calculation expression to calculate the similarities between documents. SIM(DOCK, DOCH) = { TERMik x TERMih i=1 2). Set the threshold to 45 to group the documents into clusters. 3). Calculate the centroid for each group by the following expression: CTERME = 1/m [TERMik i=1 4). Match the given a query Q = (1, 0, 5, 7, 4, 4) to find the documents that is most similar to the query.Explanation / Answer
Find the required code in matlab. Due to the unknown value of m, the centroid can't be calcualted.
%=====================================================================
d1=[2 4 1 9 2 0];
d2=[1 1 2 1 0 4];
d3=[7 2 5 0 1 0];
d4=[0 1 2 6 1 2];
d5=[3 0 1 4 2 1];
d6=[1 6 0 2 6 2];
d7=[2 6 3 2 8 1];
d=[d1;d2;d3;d4;d5;d6;d7];
for i=1:7
for j=1:7
if(i==j)
sim(i,j)=1;
else
sim(i,j)=d(i,:)*d(j,:)';
end
end
end
% Threshold calculation
for i=1:7
for j=1:7
if(sim(i,j)>45)
th(i,j)=0;
else
th(i,j)=1;
end
end
end
% Matching the given query
q=[1 0 5 7 4 4];
for i=1:7
cs(i)=d(i,:)*q';
end
% The most matching document is the one with the highest multiplication result
[m,i]=max(cs);
display(strcat('the highest matching document is D',num2str(i)));
%=====================================================================
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.