Skip Navigation



Journal of Logic and Computation Advance Access published online on December 13, 2007

Journal of Logic and Computation, doi:10.1093/logcom/exm072
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
18/3/459    most recent
exm072v1
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Peñas, A.
Right arrow Articles by Verdejo, F.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Association of Physicians. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Original papers

Testing the Reasoning for Question Answering Validation

Anselmo Peñas, Álvaro Rodrigo, Valentín Sama and Felisa Verdejo

Depto. Lenguajes y Sistemas Informáticos, UNED, Juan del Rosal, 16; 28040 Madrid; Spain E-mail: anselmo{at}lsi.uned.es, alvarory{at}lsi.uned.es, vsama{at}lsi.uned.es, felisa{at}lsi.uned.es

Received 31 July 2006.


   Abstract

Question answering (QA) is a task that deserves more collaboration between natural language processing (NLP) and knowledge representation (KR) communities, not only to introduce reasoning when looking for answers or making use of answer type taxonomies and encyclopaedic knowledge, but also, as discussed here, for answer validation (AV), that is to say, to decide whether the responses of a QA system are correct or not. This was one of the motivations for the first Answer Validation Exercise at CLEF 2006 (AVE 2006). The starting point for the AVE 2006 was the reformulation of the answer validation as a recognizing textual entailment (RTE) problem, under the assumption that a hypothesis can be automatically generated instantiating a hypothesis pattern with a QA system answer. The test collections that we developed in seven different languages at AVE 2006 are specially oriented to the development and evaluation of answer validation systems. We show in this article the methodology followed for developing these collections taking advantage of the human assessments already made in the evaluation of QA systems. We also propose an evaluation framework for AV linked to a QA evaluation track. We quantify and discuss the source of errors introduced by the reformulation of the answer validation problem in terms of textual entailment (around 2%, in the range of inter-annotator disagreement). We also show the evaluation results of the first answer validation exercise at CLEF 2006 where 11 groups have participated with 38 runs in seven different languages. The most extensively used techniques were Machine Learning and overlapping measures, but systems with broader knowledge resources and richer representation formalisms obtained the best results.

Keywords: Textual entailment; test collections; question answering; answer validation; evaluation


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.