Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Data Mining Q1. Explain why Clustering is called “Unsupervised Learning” while C

ID: 3572002 • Letter: D

Question

Data Mining

Q1. Explain why Clustering is called “Unsupervised Learning” while Classification is called “Supervised Learning”?

Give three applications of Cluster Analysis and give examples on each?

Q2. (a) What are the strength and weakness of the k-Means Clustering Partitioning method?

(b)What are the clustering methods that can be used with Numerical, categorical and mix data?

Q3. What is the difference between Single level Partition based clustering method vs. Hierarchical Clustering in terms of basic concept, strength and weakness? Q4(a). What do we aim for to have a good quality clustering in terms of Cohesiveness, and Distinctiveness?

(b) List and briefly describe the three Clustering Measure of Quality?

Q5. Many partitional clustering algorithms that automatically determine the number of clusters claim that this is an advantage. List two situations in which this is not the case.

Q6. Suppose we find K clusters using Ward’s method, bisecting K-means, and ordinary K-means. Which of these solutions represents a local or global minimum? Explain.

Q7(a)Define following term

i. Geodesic Distance

ii. Eccentricity

iii. Radius

iv. Diameter

v. peripheral vertex

(b). Measurements based on geodesic distance consider graph G in given figure and calculate following term

i. Eccentricity

ii. Radius

iii. Diameter

iv. peripheral vertex

Q8. What are the challenges in Graph Clustering?

Explanation / Answer

Q1:
-->Unsupervised Learning:
Unsupervised learning is the process to find hidden structure in "unlabeled data".
-->Clustering:
The method which organizes the unlabeled data into similar groups i.e clusters is known as Clustering.
Unsupervised learning contains unlabeled data which is given as an input to clustering method to form cluster and hence sometimes CLustering is reffered as unsupervised learning.
But remember Custering is method for unsupervised learning.
The applications of cluster analysis are:
a)Bioinformatics
ex:Human genetic clustering
The similar genetic data is used for clustering to infer population structures.
b)Business & Marketting
ex:Shopping item groupping
Cluster analysis can be used to group all the shopping items available on the website into a set of unique products.
c)World Wide Web:
ex:Social network analysis
Clustering is used to recognize communities within large groups of people.

Q2:
a)Strength and weakness of k-means Clustering Partition Method:
Strength:
->If dataset is large then k-means algorithm can compute the cluster faster.
->K-Means produces dense clusters .
Weakness:
->With global clusters, k-mean algorithm faints.
->It is difficult to predict k-values.

b)For clustering Numerical, categorical and mix data, two steps clustering can be used.But it also depends on the type of data.

Q3: Difference between Single level Partition based clustering method vs. Hierarchical Clustering:
Hierarchical Clustering
Strength:
1) No apriori information about the number of clusters required.
2) Easy to implement.
Weakness
1) Algorithm can never undo what was done previously.
2) Time complexity of at least O(n2 log n) is required, where ‘n’ is the number of data points.
Single level Clustering(k-means):
Strength:
->If dataset is large then k-means algorithm can compute the cluster faster.
->K-Means produces dense clusters .
Weakness:
->With global clusters, k-mean algorithm faints.
->It is difficult to predict k-values

Q4:The three Clustering Measure of Quality
1)Davies-Bouldin Index
It evaluates intra-cluster similarity & inter-cluster differences which can be used for quality evaluation.
2)Silhouette Index
It measure the distance between each data point and the centroid of the cluster it was assigned to and also the closest centroid belonging to another cluster.
3)Calinski-Harabasz
It is used to evaluate the optimal number of clusters.