Sorenson-Dice Index = F1 Score

Starting with a known fact: Sorenson-Dice Index is the same as F1 Score. A fact I wasn’t aware of, given that I learned about the two in different occasions and for different applications.

F1 Score is usually taught in Information Retrieval courses as a metric to evaluate retrieval systems. They never said that it’s nothing but a reformulation of Sorenson-Dice Index but in terms of recall and precision. I’ll go through the proof quickly.

For any two sets A and B , the Sorenson-Dice Index is defined as \frac{2|A \cap B|}{|A| + |B|}

On the other hand F1 = \frac{2}{(1/R) + (1/P)} , where R is recall and P is precision. Now we need to prove that the previous formula could be reduced to Sorenson-Dice Index.

First we need to rewrite the equation in terms of sets. For any two sets A and B, we consider that A is the set of relevant documents, and B is the set of retrieved documents. That way we end up with the following equations: R = \frac{|A \cap B|}{|A|} , and P = \frac{|A \cap B|}{|B|} .

F1 = \frac{2}{(1/\frac{|A \cap B|}{|A|}) + (1/ \frac{|A \cap B|}{|B|})}

= \frac{2}{\frac{|A|}{|A \cap B|} +  \frac{|B|}{|A \cap B|}}

= \frac{2}{\frac{|A| + |B|}{|A \cap B|}}

= \frac{2|A \cap B|}{|A| + |B|}

Which is the same as The Sorenson-Dice Index.