Loading...
Research Project
Untitled
Funder
Authors
Publications
A bi-objective feature selection algorithm for large omics datasets
Publication . Cavique, Luís; Mendes, Armando B.; Martiniano, Hugo F. M. C.; Correia, Luís
Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the bi-objective version of the algorithm Logical Analysis of Inconsistent Data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The bi-objective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.
A data reduction approach using hypergraphs to visualize communities and brokers in social networks
Publication . Cavique, Luís; Marques, Nuno C.; Gonçalves, António
The comprehension of social network phenomena is closely related to data visualization. However, even with only hundreds of nodes, the visualization of dense networks is usually difficult. The strategy adopted in this work is data reduction using communities. Community detection in social network analysis is a very important issue and in particular detection of community overlapping. In this approach, the information extracted from social networks transcends cohesive groups, enabling the discovery of brokers that interact among communities. In order to find admissible solutions in hard problems, relaxed approaches are used. Quasi-cliques are generated, and partition is found using a partial set covering heuristic. The proposed method allows the identification of communities and actors that link two or more groups. In the visualization process, the user can choose different dimension reduction approaches for the condensed graph. For each condensed structure a hypergraph can be drawn, identifying communities and brokers.
A feature selection algorithm based on heuristic decomposition
Publication . Cavique, Luís; Mendes, Armando B.; Martiniano, Hugo F. M. C.
Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based feature selection is a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the feature selection algorithm LAID, Logical Analysis of Inconsistent Data, is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, a problem de-composition strategy associated with a set covering problem formulation is used. The algorithm is applied to artificial datasets with genome-like characteristics of patients with rare diseases.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
5876
Funding Award Number
UID/Multi/04046/2013