Repository logo
 
Publication

Document retrieval for question answering : a quantitative evaluation of text preprocessing

dc.contributor.authorCarvalho, Gracinda
dc.contributor.authorMatos, David Martins de
dc.contributor.authorRocio, Vitor
dc.date.accessioned2017-01-24T10:28:29Z
dc.date.available2017-01-24T10:28:29Z
dc.date.issued2007
dc.description.abstractQuestion Answering (QA) has been an area of interest for researchers, in part motivated by the international QA evaluation forums, namely the Text REtrieval Conference (TREC), and more recently, the Cross Language Evaluation Forum (CLEF) through QA@CLEF, that since 2004 includes the Portuguese language. In these forums, a collection of written documents is provided, as well as a set of questions, which are to be answered by the participating systems. Each system is evaluated by its capacity to answer the questions, as a whole, and there are relatively few results published that focus on the performance of its different components and their influence on the overall system performance. That is the case of the Information Retrieval (IR) component, which is broadly used in QA systems. Our work concentrates on the different options of preprocessing Portuguese text before feeding it to the IR component, evaluating their impact on the IR performance in the specific context of QA, so that we can make a sustained choice of which options to choose. From this work we conclude the clear advantage of the basic preprocessing techniques: case folding and removal of punctuation marks. For the other techniques considered, stop word removal enhanced the performance of the IR system but that was not the case as far as Stemming and Lemmatization are concerned.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.doi10.1145/1316874.1316894pt_PT
dc.identifier.isbn978-1-59593-832-9
dc.identifier.urihttp://hdl.handle.net/10400.2/5966
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherACMpt_PT
dc.relation.publisherversionhttp://dl.acm.org/citation.cfm?id=1316894pt_PT
dc.subjectInformation retrievalpt_PT
dc.subjectQuestion answeringpt_PT
dc.titleDocument retrieval for question answering : a quantitative evaluation of text preprocessingpt_PT
dc.typeconference object
dspace.entity.typePublication
oaire.citation.endPage130pt_PT
oaire.citation.startPage125pt_PT
oaire.citation.titleProceedings of the ACM first Ph.D. workshop in CIKM, ACMpt_PT
person.familyNameCarvalho
person.familyNameRocio
person.givenNameGracinda
person.givenNameVitor
person.identifierR-000-HKF
person.identifier.ciencia-id0418-C5A8-59E2
person.identifier.orcid0000-0003-4793-6917
person.identifier.orcid0000-0002-3314-898X
person.identifier.scopus-author-id35184805000
rcaap.rightsopenAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isAuthorOfPublicationd6be5630-7fac-412a-bca3-9b16888ada6f
relation.isAuthorOfPublication7cab4248-456c-46bf-a1cf-bbd212928171
relation.isAuthorOfPublication.latestForDiscovery7cab4248-456c-46bf-a1cf-bbd212928171

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2007 CIKM p125-carvalho.pdf
Size:
387.55 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.97 KB
Format:
Item-specific license agreed upon to submission
Description: