Repository logo
 
Publication

Large language model for querying databases in Portuguese

dc.contributor.authorFigueiredo, Lourenço
dc.contributor.authorPinheiro, Paulo
dc.contributor.authorCavique, Luís
dc.contributor.authorMarques, Nuno
dc.date.accessioned2025-10-14T10:56:04Z
dc.date.available2025-10-14T10:56:04Z
dc.date.issued2024
dc.description.abstractThis study introduces a system that helps non-expert users find information easily without knowing database languages or asking technicians for help. A specific domain is explored, focusing on a subscription-based sports facility, which serves as an open-source version of a real case study. Utilizing the star schema, the available data in the database is structured to provide accessibility through Portuguese Natural Language queries. Using a Large Language Model (LLM), SQL queries are generated based on the question and the provided star schema. We created a dataset with 115 highly challenging questions drawn from real-world usage scenarios to validate the correctness of the system. Challenges found during testing, like attribute value interpretation, out-of-scope questions, and temporal interval adequacy issues, highlight the insufficiency of the star schema alone in providing the needed context for generating accurate SQL queries by the LLM. Addressing these challenges through enhanced contextual information shows significant improvement in query correctness, with validation results increasing from 57.76% to 88.79%. This study shows the potential and limitations of LLMs in generating SQL queries from Portuguese Natural Language queries.eng
dc.identifier.doi10.1007/978-3-031-73503-5_1
dc.identifier.urihttp://hdl.handle.net/10400.2/20360
dc.language.isoeng
dc.peerreviewedyes
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectNatural language processing
dc.subjectNatural Language to SQL
dc.subjectLarge language model
dc.subjectGPT-4 Turbo
dc.subjectSports facility management
dc.subjectDatabases
dc.titleLarge language model for querying databases in Portugueseeng
dc.typeconference object
dspace.entity.typePublication
oaire.citation.conferenceDate2024
oaire.citation.conferencePlaceViana do Castelo
oaire.citation.title23rd Conference on Artificial Intelligence. EPIA 2024. Progress in Artificial Intelligence
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.familyNameFigueiredo
person.familyNamePinheiro
person.familyNameCavique
person.familyNameMarques
person.givenNameLourenço
person.givenNamePaulo
person.givenNameLuís
person.givenNameNuno
person.identifier.ciencia-id061A-9AE9-D804
person.identifier.ciencia-id911E-84AC-3956
person.identifier.orcid0009-0009-6906-4495
person.identifier.orcid0000-0002-8912-2244
person.identifier.orcid0000-0002-5590-1493
person.identifier.orcid0000-0002-3019-3304
relation.isAuthorOfPublication7129e25b-7de7-41a0-8d79-498088e723ab
relation.isAuthorOfPublicationd222d7a3-c125-4f1f-ab1b-ae2f97606f81
relation.isAuthorOfPublication40906a16-46a2-42f1-b26d-7db7012294ee
relation.isAuthorOfPublication3133abcb-e445-4ae0-bacc-a69f6a524f48
relation.isAuthorOfPublication.latestForDiscoveryd222d7a3-c125-4f1f-ab1b-ae2f97606f81

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EPIA LLM4DB 4.pdf
Size:
457.34 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.97 KB
Format:
Item-specific license agreed upon to submission
Description: