Quarterly (March, June, September, December)
160 pp. per issue
6 3/4 x 10
2014 Impact factor:

Computational Linguistics

Hwee Tou Ng, Editor
June 2015, Vol. 41, No. 2, Pages 309-317
(doi: 10.1162/COLI_a_00222)
© 2015 Association for Computational Linguistics
Evaluating Human Pairwise Preference Judgments
Article PDF (204.11 KB)

Human evaluation plays an important role in NLP, often in the form of preference judgments. Although there has been some use of classical non-parametric and bespoke approaches to evaluating these sorts of judgments, there is an entire body of work on this in the context of sensory discrimination testing and the human judgments that are central to it, backed by rigorous statistical theory and freely available software, that NLP can draw on. We investigate one approach, Log-Linear Bradley-Terry models, and apply it to sample NLP data.