MURAL - Maynooth University Research Archive Library



    Bayesian Statistical Machine Learning Models for Predicting Multivariate Data with Non-Ignorable Partial Missingness


    Goh, Yong Chen (2025) Bayesian Statistical Machine Learning Models for Predicting Multivariate Data with Non-Ignorable Partial Missingness. PhD thesis, National University of Ireland Maynooth.

    Abstract

    Missing data is a pervasive challenge in statistical modelling, particularly in multivariate response settings where partial missingness leads to complex, overlapping missingness patterns. Standard methods often rely on strong and unrealistic ignorability assumptions, such as missing completely at random (MCAR) or missing at random (MAR), typically employing complete-case analysis or imputation, leading to inefficiencies and biases. This thesis introduces three novel Bayesian joint models, integrating the selection model framework with Bayesian additive regression trees (BART) to provide a flexible, non-parametric solution for handling non-ignorable partial missingness in multivariate data. The motivation for these models arises from limitations of standard missing data techniques, as exemplified by the global Amax dataset which exhibits substantial, overlapping missingness in the response variables. Original methods applied to this dataset implicitly assume ignorability, leading to biased inferences and loss of information. To address this, our novel models jointly estimate both the response and missingness processes, enabling the recovery of non-ignorable missing not at random (MNAR) mechanisms, in addition to MCAR and MAR. These models also extend to settings with partially observed covariates with ignorable missingness. By leveraging BART’s ability to flexibly model complex, non-linear relationships, we adopt a multivariate BART framework to capture dependencies across responses while maintaining predictive flexibility. For the missingness mechanism, we explore both parametric and non-parametric Bayesian approaches. The probit regression model allows for the incorporation of prior information on the missingness mechanism, offering greater interpretability when domain knowledge is available. In contrast, the probit extension of BART allows for automatic variable selection and flexibly models complex interactions. Additionally, we adopt a seemingly unrelated framework to model dependencies across responses while allowing dynamic response-covariate relationships. These methods are evaluated through extensive simulations and applied to the global Amax dataset, demonstrating strong performance in identifying non-ignorable missingness structures and recovering unobserved values.
    Item Type: Thesis (PhD)
    Keywords: Bayesian; Statistical Machine; Learning Models; Predicting Multivariate Data; Non-Ignorable Partial Missingness;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 20815
    Depositing User: IR eTheses
    Date Deposited: 06 Nov 2025 14:33
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Downloads

    Downloads per month over past year

    Origin of downloads

    Repository Staff Only (login required)

    Item control page
    Item control page