Quarterly (March, June, September, December)
160 pp. per issue
6 3/4 x 10
ISSN
0891-2017
E-ISSN
1530-9312
2014 Impact factor:
1.23

Computational Linguistics

Paola Merlo, Editor
December 2009, Vol. 35, No. 4, Pages 495-503
(doi: 10.1162/coli.2009.35.4.35402)
© 2009 Association for Computational Linguistics
From Annotator Agreement to Noise Models
Article PDF (86.57 KB)
Abstract

This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking.