Support Center

Measurement Methodology - Accuracy, Recall, & Precision

Last Updated: May 03, 2013 01:31PM EDT
What is accuracy, recall and precision?
For a given dataset, measuring accuracy, recall, and precision is performed by comparing Semantria’s output against human output. The manual comparison can be performed on a sample of the entire dataset.
 
Accuracy represents the percentage of correctly identified matches the engine detects from the possible matches. This measurement of accuracy is used on an overall basis for the entire dataset meaning correct matches include all classifier and null matches.

Accuracy=(#correct matches)/(total # possible)
 
Recall is the percentage of correctly identified matches the engine detects from available matches. Recall measurements are used for specific classifiers (ie. Baseball or football)

Recall=(#correct matches)/((#correct matches+# missed matches))
 
Precision is the percentage of correctly identified matches the engine detects from all matches made. Precision measurements are used for specific classifiers (ie. Baseball or football)

Precision=(#correct matches)/((#correct matches+# wrong matches))
 
For example, let’s assume that we have 10 long form content documents. Two queries are created to find the comments from the dataset about baseball and football, respectively. Out of the 10 documents, 5 have human identified queries about baseball and another 3 have human identified queries about football.

 
Document Sample Text Human Detection Semantria Detection
1 I love baseball Baseball Baseball
2 Baseball is my life Baseball  
3 Football is rough Football Football
4 Football is fast Football  
5 Baseball is awesome Baseball Baseball
6 I like playing football Football Baseball
7 Soccer is the greatest    
8 My grandfather played baseball Baseball Baseball
9 Baseball gives me the chills Baseball Football
10 I don’t have a favorite sport.    
 
Therefore for this dataset,

Accuracy=6/10=60%
 
For the ‘baseball’ classifier:

Recall=(#correct matches)/(#correct matches+# missed matches)

Recall=(documents'1,5,8')/(documents'1,5,8'+documents'1,9')

Recall=3/(3+2)=60%

Precision=(#correct matches)/(#correct matches+# wrong matches)

Recall=(documents'1,5,8')/(documents'1,5,8'+document '6')

Precision=3/(3+1)=75%
 

For the ‘football’ classifier:

Recall=(#correct matches)/(#correct matches+# missed matches)

Recall=(documents'3')/(documents'3'+documents'4,6')

Recall=1/(1+2)=33%

Precision=(#correct matches)/(#correct matches+# wrong matches)

Recall=(documents'3')/(documents'3'+documents'9')

Precision=1/(1+1)=50%
support@semantria.com
http://assets0.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/en/portal/articles/autocomplete