Journal of Logic and Computation Advance Access originally published online on December 13, 2007
Journal of Logic and Computation 2008 18(3):459-474; doi:10.1093/logcom/exm072
Original Articles |
Testing the Reasoning for Question Answering Validation
Depto. Lenguajes y Sistemas Informáticos, UNED, Juan del Rosal, 16; 28040 Madrid; Spain. E-mail: anselmo{at}lsi.uned.es,alvarory{at}lsi.uned.es,vsama{at}lsi.uned.es,felisa{at}lsi.uned.es
Received 31 July 2006.
Question answering (QA) is a task that deserves more collaboration between natural language processing (NLP) and knowledge representation (KR) communities, not only to introduce reasoning when looking for answers or making use of answer type taxonomies and encyclopaedic knowledge, but also, as discussed here, for answer validation (AV), that is to say, to decide whether the responses of a QA system are correct or not. This was one of the motivations for the first Answer Validation Exercise at CLEF 2006 (AVE 2006). The starting point for the AVE 2006 was the reformulation of the answer validation as a recognizing textual entailment (RTE) problem, under the assumption that a hypothesis can be automatically generated instantiating a hypothesis pattern with a QA system answer. The test collections that we developed in seven different languages at AVE 2006 are specially oriented to the development and evaluation of answer validation systems. We show in this article the methodology followed for developing these collections taking advantage of the human assessments already made in the evaluation of QA systems. We also propose an evaluation framework for AV linked to a QA evaluation track. We quantify and discuss the source of errors introduced by the reformulation of the answer validation problem in terms of textual entailment (around 2%, in the range of inter-annotator disagreement). We also show the evaluation results of the first answer validation exercise at CLEF 2006 where 11 groups have participated with 38 runs in seven different languages. The most extensively used techniques were Machine Learning and overlapping measures, but systems with broader knowledge resources and richer representation formalisms obtained the best results.
Keywords: Textual entailment; test collections; question answering; answer validation; evaluation
References
- Bar-Haim R, Dagan I, Dolan B, Ferro L, Giampiccolo D, Magnini B, Szpektor I. The second PASCAL recognising textual entailment challenge. In. In: Proceedings of the Challenges Workshop (2006) April. Venice. 1–9.
- Barzilay R, Lee L. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In . In: Proceedings of HLT-NAACL 2003 (2003) Edmonton. 16–23.
- Burger J, Ferro L. Generating an entailment corpus from news headlines. In . In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment (2005) June. 49–54. Association for Computational Linguistics, Ann Arbor, Michigan.
- Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge. In. (2005) April. Southampton, UK. 1–8. Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment.
- Dolan B, Quirk C, Brockett C. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In. In: Proceedings of COLING 2004 (2004) Geneva, Switzerland.
- Harabagiu S, Hickl A. Methods for using textual entailment in open-domain question answering. In . In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL (2006) Sydney. 905–912.
- Harabagiu S, Moldovan D, Clark C, Bowden M, Williams J, Bensley J. Answer mining by combining extraction techniques with abductive reasoning. In. Proceedings of TREC 2003 (2003) 375–382. Gaithersburg, Maryland.
- Herrera J, Peñas A, Verdejo F. Question answering pilot task at CLEF 2004 In. In: Multilingual Information Access for Text, Speech and Images. CLEF 2004—Peters C, et al, eds. (2005) Berlin: Springer-Verlag. 581–590. Lecture Notes in Computer Science, 3491.
- Li X, Roth D. Learning question classifiers. In . (2002) Taipei, Tiwan. 556–562. Association for Computational Linguistics, Proceedings of the 19th International Conference on Computational Linguistics, COLING'02.
- Lin D, Pantel P. DIRT Discovery of inference rules from text. In . (2001) ACM Press. 323–328. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining.
- Magnini B, Romagnoli S, Vallin A, Herrera J, Peñas A, Peinado V, Verdejo F, de Rijke M. The multiple language question answering track at CLEF 2003 In. In: Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003—Peters C, et al, eds. (2004) Berlin: Springer-Verlag. 471–486. Lecture Notes in Computer Science, 3237.
- Magnini B, Vallin A, Ayache C, Erbach G, Peñas A, de Rijke M, Rocha P, Simov K, Sutcliffe R. Overview of the CLEF 2004 multilingual question answering track. In. In: Multilingual Information Access for Text, Speech and Images. CLEF 2004—Peters C, et al, eds. (2005) Berlin: Springer-Verlag. 581–590. Lecture Notes in Computer Science 3491.
- Moldovan Dan I, Clark Christine, Harabagiu Sanda M, Steven J. Maiorano: COGEX: a logic prover for question answering. In. Proceedings of HLT-NAACL 2003 (2003) 87–93. Edmonton.
- Nardi A, Peters C, Vicedo JL, eds. Working Notes of the CLEF 2006 Workshop (2006) Alicante, Spain.
- Peñas A, Verdejo F, Herrera J. Spanish question answering evaluation. In. In: Computational Linguistics and Intelligent Text Processing. CICLing 2004—Gelbukh A, ed. (2004) Berlin: Springer-Verlang. 472–483. Lecture Notes in Computer Science, 2945.
- Shinyama Y, Sekine S, Sudo K, Grishman R. Automatic paraphrase acquisition from news articles. In. (2002) Proceedings of Human Language Technology Conference: San Diego, USA.
- Vallin A, Magnini B, Giampiccolo D, Aunimo L, Ayache C, Osenova P, Peñas A, de Rijke M, Sacaleanu B, Santos D, Sutcliffe R. Overview of the CLEF 2005 multilingual question answering track. In. In: Accessing Multilingual Information Respositories, CLEF 2005—Peters C, et al, eds. (2006) Berlin: Springer-Verlag. 307–331. Lecture Notes in Computer Science, 4022.
| ||||||||||||||||||||||||||||||||||||||||||||||||||