MURAL - Maynooth University Research Archive Library



    Feature selection and hierarchical modelling in tree-based machine learning models.


    Bruna, Davies Wundervald (2023) Feature selection and hierarchical modelling in tree-based machine learning models. PhD thesis, National University of Ireland Maynooth.

    [thumbnail of Thesis__Bruna_-updated.pdf]
    Preview
    Text
    Thesis__Bruna_-updated.pdf
    Available under License Creative Commons Attribution Non-commercial Share Alike.

    Download (4MB) | Preview

    Abstract

    Tree-based algorithms are quite popular in the machine learning area in general, due to its many advantages: interpretability, flexibility, high prediction power, and so on. They can be used to many different classification and regression problems, and are in constant development. Because of that, there are many tree-based machine learning algorithms available, including both standard and Bayesian options. In this thesis, we propose a few methodological extensions to tree-based models including BART, which is the main Bayesian version of it. The list of methods is: extending and generalizing the feature gain penalization idea for tree- based algorithms; extending the BART model into HEBART, to deal with hierarchical data, when there is a grouping variable present; lastly, extending HEBART to deal with more complicated hierarchical data situations. The methods proposed here aim to tackle important deficiencies of the algorithms in question, as they are very popular and in high-demand at the moment. The first method develops a new gain penalization idea that exhibits a general local-global regularization for tree-based models, which is able to create much more powerful and interpretable generalizations of the gain penalization method. One of the main advantages of this technique is that it can be applied to all (non-Bayesian) tree-based algorithms without loss of generality. The second method switches topics a bit and deals with simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HEBART). This model allows for random effects to be included at the terminal node level of the set of regression trees estimated in BART, making it a non-parametric alternative to mixed effects models. At last, we propose yet a few more extensions to HEBART, namely, I) the Crossed Random Effects HEBART (CHEBART) which allows for multiple grouping variables in the same model; II) the Nested Random Effects HEBART (NHEBART) approach, which accounts for multiple nested grouping variables, where each group level has sub-levels (or sub-groups).
    Item Type: Thesis (PhD)
    Keywords: Feature selection; hierarchical; modelling; tree-based machine learning models;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 19559
    Depositing User: IR eTheses
    Date Deposited: 06 Mar 2025 12:25
    Funders: Science Foundation Ireland Career Development Award Grant number 17/CDA/4695
    URI: https://mural.maynoothuniversity.ie/id/eprint/19559
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only (login required)

    Item control page
    Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads