Feature selection and hierarchical modelling in
tree-based machine learning models.

Bruna, Davies Wundervald

Feature selection and hierarchical modelling in tree-based machine learning models.

Share and Export

Bruna, Davies Wundervald (2023) Feature selection and hierarchical modelling in tree-based machine learning models. PhD thesis, National University of Ireland Maynooth.

[thumbnail of Thesis__Bruna_-updated.pdf]

Preview

Text
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (4MB) | Preview

Abstract

Tree-based algorithms are quite popular in the machine learning area in general, due to its many advantages: interpretability, flexibility, high prediction power, and so on. They can be used to many different classification and regression problems, and are in constant development. Because of that, there are many tree-based machine learning algorithms available, including both standard and Bayesian options. In this thesis, we propose a few methodological extensions to tree-based models including BART, which is the main Bayesian version of it. The list of methods is: extending and generalizing the feature gain penalization idea for tree- based algorithms; extending the BART model into HEBART, to deal with hierarchical data, when there is a grouping variable present; lastly, extending HEBART to deal with more complicated hierarchical data situations. The methods proposed here aim to tackle important deficiencies of the algorithms in question, as they are very popular and in high-demand at the moment. The first method develops a new gain penalization idea that exhibits a general local-global regularization for tree-based models, which is able to create much more powerful and interpretable generalizations of the gain penalization method. One of the main advantages of this technique is that it can be applied to all (non-Bayesian) tree-based algorithms without loss of generality. The second method switches topics a bit and deals with simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HEBART). This model allows for random effects to be included at the terminal node level of the set of regression trees estimated in BART, making it a non-parametric alternative to mixed effects models. At last, we propose yet a few more extensions to HEBART, namely, I) the Crossed Random Effects HEBART (CHEBART) which allows for multiple grouping variables in the same model; II) the Nested Random Effects HEBART (NHEBART) approach, which accounts for multiple nested grouping variables, where each group level has sub-levels (or sub-groups).

Item Type:	Thesis (PhD)
Keywords:	Feature selection; hierarchical; modelling; tree-based machine learning models;
Academic Unit:	Faculty of Science and Engineering > Research Institutes > Hamilton Institute
Item ID:	19559
Depositing User:	IR eTheses
Date Deposited:	06 Mar 2025 12:25
Funders:	Science Foundation Ireland Career Development Award Grant number 17/CDA/4695
Use Licence:	This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

MURAL - Maynooth University Research Archive Library

Feature selection and hierarchical modelling in tree-based machine learning models.

Abstract

Downloads

Origin of downloads

Repository Staff Only (login required)