September | 2017 | The Abyss of CS

Starting with a known fact: Sorenson-Dice Index is the same as F1 Score. A fact I wasn’t aware of, given that I learned about the two in different occasions and for different applications.

F1 Score is usually taught in Information Retrieval courses as a metric to evaluate retrieval systems. They never said that it’s nothing but a reformulation of Sorenson-Dice Index but in terms of recall and precision. I’ll go through the proof quickly.

For any two sets $A$ and $B$ , the Sorenson-Dice Index is defined as $\frac{2|A \cap B|}{|A| + |B|}$

On the other hand $F1 = \frac{2}{(1/R) + (1/P)}$ , where $R$ is recall and $P$ is precision. Now we need to prove that the previous formula could be reduced to Sorenson-Dice Index.

First we need to rewrite the equation in terms of sets. For any two sets $A$ and $B$ , we consider that $A$ is the set of relevant documents, and $B$ is the set of retrieved documents. That way we end up with the following equations: $R = \frac{|A \cap B|}{|A|}$ , and $P = \frac{|A \cap B|}{|B|}$ .

$F1 = \frac{2}{(1/\frac{|A \cap B|}{|A|}) + (1/ \frac{|A \cap B|}{|B|})}$

$= \frac{2}{\frac{|A|}{|A \cap B|} + \frac{|B|}{|A \cap B|}}$

$= \frac{2}{\frac{|A| + |B|}{|A \cap B|}}$

$= \frac{2|A \cap B|}{|A| + |B|}$

Which is the same as The Sorenson-Dice Index.

The Abyss of CS

Month September 2017

Sorenson-Dice Index = F1 Score