Name: | Description: | Size: | Format: | |
---|---|---|---|---|
2.81 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
O presente estudo visa investigar as etapas de um sistema construído para o
processo de automatização da recolha de informação genealógica: reconhecimento
de caracteres a partir de fontes físicas e extração de dados da World Wide Web,
recuperação de informações relevantes, extração de relações familiares, inserção
dos dados em ficheiros de formato apropriado e, consequentemente, visualização
gráfica num formato claro e com o menor número possível de distorções.
Campos da informática que evoluíram do estudo do reconhecimento de padrões e
da teoria da aprendizagem computacional em inteligência artificial são atualmente
utilizados para resolver a tarefa de extração de relações de entidades, o que ajuda
muito o processo de investigação genealógica.
Alguns trabalhos já procuraram nos últimos anos medir a capacidade de identificar
texto e extrair informação útil, otimizando a relação entre a fonte de informação e a
sua exibição em diagramas. Uma aplicação promissora é a conversão de texto em
formato livre utilizando técnicas de processamento de linguagem natural, seguida de
treino de um modelo de aprendizagem de máquina. Finalmente, as relações
escolhidas podem ser convertidas em ficheiros GEDCOM que permitem facilmente a
criação de árvores genealógicas.
The present study aims to investigate the steps of a system built for the process of automating the collection of genealogical information: character recognition from physical sources and extraction of data from the World Wide Web, retrieval of relevant information, extraction of family relationships, insertion of the data into files of appropriate format and, consequently, graphical visualization in a clear format and with as few distortions as possible. Fields of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence are currently used to solve the task of extracting entity relationships, which greatly aids the process of genealogical research. Some works has already sought in recent years to measure the ability to identify text and extract useful information, optimizing the relationship between the source of information and its display in diagrams. A solution that has shown good results is free-form text conversion using natural language processing techniques followed by training of a machine learning model. Finally, the chosen relationships can be converted into GEDCOM files that easily enable the creation of family trees.
The present study aims to investigate the steps of a system built for the process of automating the collection of genealogical information: character recognition from physical sources and extraction of data from the World Wide Web, retrieval of relevant information, extraction of family relationships, insertion of the data into files of appropriate format and, consequently, graphical visualization in a clear format and with as few distortions as possible. Fields of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence are currently used to solve the task of extracting entity relationships, which greatly aids the process of genealogical research. Some works has already sought in recent years to measure the ability to identify text and extract useful information, optimizing the relationship between the source of information and its display in diagrams. A solution that has shown good results is free-form text conversion using natural language processing techniques followed by training of a machine learning model. Finally, the chosen relationships can be converted into GEDCOM files that easily enable the creation of family trees.
Description
Keywords
Reconhecimento de caracteres Extração de informação Reconhecimento de entidade mencionada Aprendizagem de máquina GEDCOM Character recognition Information extraction Named entity recognition Machine learning Genealogical diagram
Citation
Schatz, Jan Paulo Borges - Automation of the genealogical process 8Em linha]: information extraction for GEDCOM files. [S.l.]: [s.n.], [2023], 97 p.