Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1014.36 KB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
O tratamento de conjuntos de dados de grande dimensão é uma questão que é recorrente nos dias de hoje. Uma das abordagens possíveis passa por realizar uma seleção de atributos que permita diminuir, consideravelmente, a dimensão dos dados sem aumentar a inconsistência dos mesmos. A Análise Lógica de Dados Inconsistentes (LAID) é uma metodologia sistematizada, robusta, sendo fácil de interpretar e consegue lidar com dados inconsistentes. O paradigma, relativamente ao manuseamento de grandes volumes de dados, tem-se alterado. Antes, o tratamento dos dados era efetuado num único computador e o acesso era realizado depois do seu carregamento em memória. A tendência atual é aceder aos dados em disco, num ambiente cloud. Este trabalho pretende validar o novo paradigma, com recurso ao sistema de dados HDF5 e ao ambiente remoto disponibilizado pela. Pelo facto de o HDF5 ser o sistema adotado pela comunidade Python para lidar com dados de grande dimensão, esta linguagem foi escolhida para implementação do LAID.
The treatment of large datasets is an issue that is often addressed today and whose task is not simple, given the computational limitations that still exist.One possible approach is to perform a feature selection that allows a considerably reduction of data size without increasing inconsistency. Logical Analysis of Inconsistent Data (LAID) is a systematic, robust methodology that is easy to interpret and can handle inconsistent data.The paradigm regarding the handling of large data has hasbeen changing over. Previously, data processing was performed on a single computer, with in-memory data access. The current trend is to access data on disk, in a cloud environment. The present work intends to validate this new paradigm, using HDF5 data system and remote environment provided by INCD. Because HDF5 is the system adopted by Python’s community to handle large datasets, this language was chosen for LAID algorithm implementation.
The treatment of large datasets is an issue that is often addressed today and whose task is not simple, given the computational limitations that still exist.One possible approach is to perform a feature selection that allows a considerably reduction of data size without increasing inconsistency. Logical Analysis of Inconsistent Data (LAID) is a systematic, robust methodology that is easy to interpret and can handle inconsistent data.The paradigm regarding the handling of large data has hasbeen changing over. Previously, data processing was performed on a single computer, with in-memory data access. The current trend is to access data on disk, in a cloud environment. The present work intends to validate this new paradigm, using HDF5 data system and remote environment provided by INCD. Because HDF5 is the system adopted by Python’s community to handle large datasets, this language was chosen for LAID algorithm implementation.
Description
Keywords
Data Mining Seleção de atributos LAID HDF5 Python INCD Feature selection
Citation
Apolónia, João; Cavique, Luís - Seleção de atributos de dados inconsistentes em ambiente HDF5+Python na cloud INCD. "Revista de Ciências da Computação" [Em linha]. ISSN 2182-1801 (Online). Vol. 14 (2019), p. 85-112
Publisher
Universidade Aberta