What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think (2021)

First Author: Howcroft D

Attributed to: DILiGENt: Domain-Independent Language Generation funded by EPSRC

No abstract provided

Type: Conference/Paper/Proceeding/Abstract