MURAL - Maynooth University Research Archive Library



    Extensions of Bayesian Non-Parametric Causal Inference Machine Learning Methods with Applications to Large Scale Educational Studies


    McJames, Nathan (2025) Extensions of Bayesian Non-Parametric Causal Inference Machine Learning Methods with Applications to Large Scale Educational Studies. PhD thesis, National University of Ireland Maynooth.

    [thumbnail of Nathan_Final_Thesis.pdf]
    Preview
    Text
    Nathan_Final_Thesis.pdf
    Available under License Creative Commons Attribution Non-commercial Share Alike.

    Download (6MB) | Preview

    Abstract

    When exploring how a unique individual’s characteristics can lead to variations in their response to treatment, Bayesian non-parametric causal inference machine learning methods based on Bayesian Additive Regression Trees (BART) and Bayesian Causal Forests (BCF) have emerged as leading approaches. This thesis presents a series of studies focused on extending and applying these methods to large scale educational studies. We begin by demonstrating the broad potential for these methods in educational studies by applying BART to English data from the Teaching and Learning International Survey (TALIS 2018). By estimating the effect of multiple treatments on teacher job satisfaction, we identify positive factors such as continual professional development and induction activities that may be used to improve job satisfaction, thus encouraging teachers to stay in their jobs and new entrants to join the profession. Our second contribution is a multivariate extension of Bayesian Causal Forests, designed to estimate the effect of an intervention on multiple outcome variables simultaneously. By allowing the tree structure of BCF to benefit from the shared information across all outcome variables, we demonstrate the performance gains made possible with this approach. Applying this method to Irish data from the Trends in International Mathematics and Science Study (TIMSS 2019), we also investigate the effect of a number of home-related factors on student achievement such as having access to a study desk at home, often being absent, or often feeling hungry when arriving at school. Later, we augment this multivariate model in order to investigate the separate effects of homework frequency and homework duration on student achievement in mathematics and science, again using data from TIMSS 2019. We find that while increasing homework frequency can lead to greater homework benefits, increasing homework duration beyond 15 minutes has no additional effect. Our final contribution is a longitudinal extension of BCF, designed to estimate treatment effects from multiple waves of data, using a structure similar to that of the difference-in-differences approach. With the help of simulation studies, we demonstrate the performance gains made possible with our new method. Applying this model to data from the High School Longitudinal Study of 2009 (HSLS), we also reveal the negative effects of participation in intensive part-time work by high school students.
    Item Type: Thesis (PhD)
    Keywords: Bayesian; Non-Parametric; Causal Inference; Machine Learning; Methods; Large Scale Educational Studies;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 20107
    Depositing User: IR eTheses
    Date Deposited: 26 Jun 2025 14:31
    Funders: Science Foundation Ireland
    URI: https://mural.maynoothuniversity.ie/id/eprint/20107
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only (login required)

    Item control page
    Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads