Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

4. (20 points) Consider the following sample document collection: D1 = (2, 4, 1,

ID: 3903999 • Letter: 4

Question

4. (20 points) Consider the following sample document collection: D1 = (2, 4, 1, 9, 2, 0) D2 = (1, 1, 2, 1, 0,4) D3 = (7, 2, 5, 0, 1, 0) D4 = (0, 1, 2, 6, 1, 2) D5 =(3, 0, 1, 4, 2, 1) D6 = (1,6, 0, 2, 6, 2) D7 = (2, 6, 3, 2, 8, 1) 1.) Using the following similarity calculation expression to calculate the similarities between documents. SIM(DOCK, DOCH) = { TERMik x TERMih i=1 2). Set the threshold to 45 to group the documents into clusters. 3). Calculate the centroid for each group by the following expression: CTERME = 1/m [TERMik i=1 4). Match the given a query Q = (1, 0, 5, 7, 4, 4) to find the documents that is most similar to the query.

Explanation / Answer

Find the required code in matlab. Due to the unknown value of m, the centroid can't be calcualted.

%=====================================================================

d1=[2 4 1 9 2 0];
d2=[1 1 2 1 0 4];
d3=[7 2 5 0 1 0];
d4=[0 1 2 6 1 2];
d5=[3 0 1 4 2 1];
d6=[1 6 0 2 6 2];
d7=[2 6 3 2 8 1];

d=[d1;d2;d3;d4;d5;d6;d7];

for i=1:7
for j=1:7
    if(i==j)
        sim(i,j)=1;
    else
        sim(i,j)=d(i,:)*d(j,:)';
    end
end
end

% Threshold calculation
for i=1:7
for j=1:7
    if(sim(i,j)>45)
        th(i,j)=0;
    else
        th(i,j)=1;
    end
end
end

% Matching the given query
q=[1 0 5 7 4 4];
for i=1:7
cs(i)=d(i,:)*q';
end

% The most matching document is the one with the highest multiplication result
[m,i]=max(cs);

display(strcat('the highest matching document is D',num2str(i)));

%=====================================================================

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote