Repository logo
 
Loading...
Thumbnail Image
Publication

Addressing low dimensionality feature subset selection: reliefF(-k) or extended correlation-based feature selection(eCFS)?

Use this identifier to reference this record.
Name:Description:Size:Format: 
soco2019_paper_26.pdf216.46 KBAdobe PDF Download

Advisor(s)

Abstract(s)

This paper tackles problems where attribute selection is not only able to choose a few features but also to achieve a low performance classification in terms of accuracy compared to the full attribute set. Correlation-based feature selection (CFS) has been set as the baseline attribute subset selection due to its popularity and high performance. Around hundred data sets have been collected and submitted to CFS; then the problems fulling simultaneously the conditions: a) a number of selected attributes lower than six and b) a percentage of selected attributes lower than a forty per cent, have been tested onto two directions.Firstly, in the scope of data selection at the feature level, some options proposed in a prior work as well as an advanced contemporary approach have been conducted. Secondly, the data pre-processed and initial problems have been tested with some sturdy classifiers. Moreover, this work introduces a new taxonomy of feature selection according to the solution type and the followed way to compute it. The test bed comprises seven problems, three out of them report a single selected attribute, another one with two extracted features and the three remaining data sets with four or five retained attributes, all of them by CFS; additionally, the feature set is between six and twenty nine and the complexity of the problems, in terms of classes, uctuates between two and twenty one, throwing averages of sixteen and around five for both aforementioned properties. The contribution concluded that the advanced procedure is suitable for problems where only one or two attributes are selected by CFS; for data sets with more than two selected features the baseline method is preferable to the advanced one, although the considered feature ranking method achieved intermediate results.

Description

Keywords

Machine learning Feature subset selection Feature ranking Extended feature subset selection

Citation

Research Projects

Organizational Units

Journal Issue