McJames, Nathan (2025) Extensions of Bayesian Non-Parametric Causal Inference Machine Learning Methods with Applications to Large Scale Educational Studies. PhD thesis, National University of Ireland Maynooth.
Preview
Nathan_Final_Thesis.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (6MB) | Preview
Abstract
When exploring how a unique individual’s characteristics can lead to variations
in their response to treatment, Bayesian non-parametric causal inference machine
learning methods based on Bayesian Additive Regression Trees (BART) and
Bayesian Causal Forests (BCF) have emerged as leading approaches. This thesis
presents a series of studies focused on extending and applying these methods to
large scale educational studies.
We begin by demonstrating the broad potential for these methods in educational
studies by applying BART to English data from the Teaching and Learning
International Survey (TALIS 2018). By estimating the effect of multiple treatments
on teacher job satisfaction, we identify positive factors such as continual
professional development and induction activities that may be used to improve
job satisfaction, thus encouraging teachers to stay in their jobs and new entrants
to join the profession.
Our second contribution is a multivariate extension of Bayesian Causal Forests,
designed to estimate the effect of an intervention on multiple outcome variables
simultaneously. By allowing the tree structure of BCF to benefit from the shared
information across all outcome variables, we demonstrate the performance gains
made possible with this approach. Applying this method to Irish data from the
Trends in International Mathematics and Science Study (TIMSS 2019), we also
investigate the effect of a number of home-related factors on student achievement
such as having access to a study desk at home, often being absent, or often feeling
hungry when arriving at school.
Later, we augment this multivariate model in order to investigate the separate
effects of homework frequency and homework duration on student achievement in
mathematics and science, again using data from TIMSS 2019. We find that while
increasing homework frequency can lead to greater homework benefits, increasing
homework duration beyond 15 minutes has no additional effect.
Our final contribution is a longitudinal extension of BCF, designed to estimate
treatment effects from multiple waves of data, using a structure similar to that
of the difference-in-differences approach. With the help of simulation studies, we
demonstrate the performance gains made possible with our new method. Applying
this model to data from the High School Longitudinal Study of 2009 (HSLS), we
also reveal the negative effects of participation in intensive part-time work by high
school students.
Item Type: | Thesis (PhD) |
---|---|
Keywords: | Bayesian; Non-Parametric; Causal Inference; Machine Learning; Methods; Large Scale Educational Studies; |
Academic Unit: | Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
Item ID: | 20107 |
Depositing User: | IR eTheses |
Date Deposited: | 26 Jun 2025 14:31 |
Funders: | Science Foundation Ireland |
URI: | https://mural.maynoothuniversity.ie/id/eprint/20107 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year