Goh, Yong Chen (2025) Bayesian Statistical Machine Learning Models for Predicting Multivariate Data with Non-Ignorable Partial Missingness. PhD thesis, National University of Ireland Maynooth.
Preview
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (7MB) | Preview
Abstract
Missing data is a pervasive challenge in statistical modelling, particularly in multivariate
response settings where partial missingness leads to complex, overlapping missingness
patterns. Standard methods often rely on strong and unrealistic ignorability assumptions,
such as missing completely at random (MCAR) or missing at random (MAR), typically
employing complete-case analysis or imputation, leading to inefficiencies and biases. This
thesis introduces three novel Bayesian joint models, integrating the selection model framework
with Bayesian additive regression trees (BART) to provide a flexible, non-parametric
solution for handling non-ignorable partial missingness in multivariate data.
The motivation for these models arises from limitations of standard missing data techniques,
as exemplified by the global Amax dataset which exhibits substantial, overlapping
missingness in the response variables. Original methods applied to this dataset implicitly
assume ignorability, leading to biased inferences and loss of information. To address this,
our novel models jointly estimate both the response and missingness processes, enabling
the recovery of non-ignorable missing not at random (MNAR) mechanisms, in addition to
MCAR and MAR. These models also extend to settings with partially observed covariates
with ignorable missingness.
By leveraging BART’s ability to flexibly model complex, non-linear relationships, we
adopt a multivariate BART framework to capture dependencies across responses while
maintaining predictive flexibility. For the missingness mechanism, we explore both parametric
and non-parametric Bayesian approaches. The probit regression model allows for
the incorporation of prior information on the missingness mechanism, offering greater
interpretability when domain knowledge is available. In contrast, the probit extension
of BART allows for automatic variable selection and flexibly models complex interactions.
Additionally, we adopt a seemingly unrelated framework to model dependencies
across responses while allowing dynamic response-covariate relationships. These methods
are evaluated through extensive simulations and applied to the global Amax dataset,
demonstrating strong performance in identifying non-ignorable missingness structures and
recovering unobserved values.
| Item Type: | Thesis (PhD) |
|---|---|
| Keywords: | Bayesian; Statistical Machine; Learning Models; Predicting Multivariate Data; Non-Ignorable Partial Missingness; |
| Academic Unit: | Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
| Item ID: | 20815 |
| Depositing User: | IR eTheses |
| Date Deposited: | 06 Nov 2025 14:33 |
| Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Downloads
Downloads per month over past year
Share and Export
Share and Export