Bruna, Davies Wundervald (2023) Feature selection and hierarchical modelling in tree-based machine learning models. PhD thesis, National University of Ireland Maynooth.
Preview
Thesis__Bruna_-updated.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (4MB) | Preview
Abstract
Tree-based algorithms are quite popular in the machine learning area in general,
due to its many advantages: interpretability, flexibility, high prediction power, and
so on. They can be used to many different classification and regression problems,
and are in constant development. Because of that, there are many tree-based machine
learning algorithms available, including both standard and Bayesian options.
In this thesis, we propose a few methodological extensions to tree-based models
including BART, which is the main Bayesian version of it. The list of methods
is: extending and generalizing the feature gain penalization idea for tree- based
algorithms; extending the BART model into HEBART, to deal with hierarchical
data, when there is a grouping variable present; lastly, extending HEBART to deal
with more complicated hierarchical data situations. The methods proposed here
aim to tackle important deficiencies of the algorithms in question, as they are very
popular and in high-demand at the moment.
The first method develops a new gain penalization idea that exhibits a general
local-global regularization for tree-based models, which is able to create much more
powerful and interpretable generalizations of the gain penalization method. One of
the main advantages of this technique is that it can be applied to all (non-Bayesian)
tree-based algorithms without loss of generality. The second method switches
topics a bit and deals with simple yet powerful extension of Bayesian Additive
Regression Trees which we name Hierarchical Embedded BART (HEBART). This
model allows for random effects to be included at the terminal node level of the set
of regression trees estimated in BART, making it a non-parametric alternative to
mixed effects models. At last, we propose yet a few more extensions to HEBART,
namely, I) the Crossed Random Effects HEBART (CHEBART) which allows for
multiple grouping variables in the same model; II) the Nested Random Effects
HEBART (NHEBART) approach, which accounts for multiple nested grouping
variables, where each group level has sub-levels (or sub-groups).
Item Type: | Thesis (PhD) |
---|---|
Keywords: | Feature selection; hierarchical; modelling; tree-based machine learning models; |
Academic Unit: | Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
Item ID: | 19559 |
Depositing User: | IR eTheses |
Date Deposited: | 06 Mar 2025 12:25 |
Funders: | Science Foundation Ireland Career Development Award Grant number 17/CDA/4695 |
URI: | https://mural.maynoothuniversity.ie/id/eprint/19559 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year