Wednesday, November 11, 2015

Data Minng: Jaccard Similarity for bags


I was already familiar with the Jaccard Similarity index but not Jaccard Similarity for bags.

Source: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

"If ratings are 1-to-5-stars, put a movie in a customer’s set n times if they rated the movie n-stars. Then, use Jaccard similarity for bags when measuring the similarity of customers. The Jaccard similarity for bags B and C is defined by counting an element n times in the intersection if n is the minimum of the number of times the element appears in B and C. In the union, we count the element the sum of the number of times it appears in B and in C."

"Example 3.2 : The bag-similarity of bags {a, a, a, b} and {a, a, b, b, c} is 1/3. The intersection counts a twice and b once, so its size is 3. The size of the union of two bags is always the sum of the sizes of the two bags, or 9 in this case. Since the highest possible Jaccard similarity for bags is 1/2, the score of 1/3 indicates the two bags are quite similar, as should be apparent from an examination of their contents."



1 comment:

  1. I really loved reading your blog. It was very well authored and easy to understand. Unlike other blogs I have read which are really not that good.Thanks alot!
    designer handbags made in italy

    ReplyDelete