Logo do repositório
 
A carregar...
Miniatura
Publicação

Large language model for querying databases in Portuguese

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
EPIA LLM4DB 4.pdf457.34 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

This study introduces a system that helps non-expert users find information easily without knowing database languages or asking technicians for help. A specific domain is explored, focusing on a subscription-based sports facility, which serves as an open-source version of a real case study. Utilizing the star schema, the available data in the database is structured to provide accessibility through Portuguese Natural Language queries. Using a Large Language Model (LLM), SQL queries are generated based on the question and the provided star schema. We created a dataset with 115 highly challenging questions drawn from real-world usage scenarios to validate the correctness of the system. Challenges found during testing, like attribute value interpretation, out-of-scope questions, and temporal interval adequacy issues, highlight the insufficiency of the star schema alone in providing the needed context for generating accurate SQL queries by the LLM. Addressing these challenges through enhanced contextual information shows significant improvement in query correctness, with validation results increasing from 57.76% to 88.79%. This study shows the potential and limitations of LLMs in generating SQL queries from Portuguese Natural Language queries.

Descrição

Palavras-chave

Natural language processing Natural Language to SQL Large language model GPT-4 Turbo Sports facility management Databases

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo