b) Consider a binary data set representing the answers that a class of students
ID: 3697040 • Letter: B
Question
b) Consider a binary data set representing the answers that a class of students gave on a true/false test. The rows are students, the columns are questions, and the entry corresponding to the i th row (student) and j th column (question) is a 1 if the student answered true and a 0 if the student answered false.
i. Circle yes or no to indicate which of the following properties an interestingness measure should possess to be useful for evaluating whether a set of two attributes in this data set are strongly related, i.e., to tell if students answer two questions in a similar way.
1. Symmetry Yes No
2. Invariant under inversion Yes No
3. Invariant under null addition Yes No
ii. Based on the above answers above, would you prefer confidence, the cosine measure, or correlation for this task?
Explanation / Answer
Solution:
(a)
Item set can be defined as the fraction of transactions in which an item set appears.
For example if an Item set {a,b,c,d} contains in 5 transactions out of 10 transactions.
Support of item set is
s({a, b, c, d}) = 2/5
From the above example
Since frequency of the whole item set never exceeds the frequency of the single item set support value.
The monotone of the dataset cannot be measured using the new support, since it does not give the whole frequency of the itemsets.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.