An information theoretic approach to quantify the stability of feature selection and ranking algorithms

Share and Export

Alaiz-Rodríguez, Rocío and Parnell, Andrew (2020) An information theoretic approach to quantify the stability of feature selection and ranking algorithms. Knowledge-Based Systems, 195. p. 105745. ISSN 0950-7051

[thumbnail of AndrewParnellKnow2022.pdf]

Preview

Text
AndrewParnellKnow2022.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (1MB) | Preview

Abstract

Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations. We propose an information-theoretic approach based on the Jensen–Shannon divergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists. This generalized metric quantifies the difference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties including correction for change, upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes, respectively. Additionally, experimental validation of the proposed approach is carried out on a real-world problem of food quality assessment showing its potential to quantify stability from different perspectives.

Item Type:	Article
Additional Information:	Cite as: Rocío Alaiz-Rodríguez, Andrew C. Parnell, An information theoretic approach to quantify the stability of feature selection and ranking algorithms, Knowledge-Based Systems, Volume 195, 2020, 105745, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2020.105745.
Keywords:	Feature selection; Feature ranking; Stability; Robustness; Jensen–Shannon divergence
Academic Unit:	Faculty of Social Sciences > Research Institutes > Irish Climate Analysis and Research Units, ICARUS
Item ID:	16236
Identification Number:	10.1016/j.knosys.2020.105745
Depositing User:	Andrew Parnell
Date Deposited:	05 Jul 2022 15:11
Journal or Publication Title:	Knowledge-Based Systems
Publisher:	Science Direct
Refereed:	Yes
Related URLs:	Publisher
URI:	https://mural.maynoothuniversity.ie/id/eprint/16236
Use Licence:	This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

Repository Staff Only (login required)

: Item control page

Downloads

Downloads per month over past year

Origin of downloads

Altmetric

MURAL - Maynooth University Research Archive Library

An information theoretic approach to quantify the stability of feature selection and ranking algorithms

Abstract

Repository Staff Only (login required)

Downloads

Origin of downloads