Murphy, Keefe and Murphy, T. Brendan and Piccarreta, Raffaella and Gormley, I. Claire (2021) Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models. Journal of the Royal Statistical Society Series A: Statistics in Society, 184 (4). pp. 1414-1451. ISSN 0964-1998
|
Download (1MB)
| Preview
|
Abstract
Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.
Item Type: | Article |
---|---|
Additional Information: | Cite as: Keefe Murphy, T. Brendan Murphy, Raffaella Piccarreta, I. Claire Gormley, Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 184, Issue 4, October 2021, Pages 1414–1451, https://doi.org/10.1111/rssa.12712 |
Keywords: | exponential-distance models; gating covariates; life-course sequences; model-based clustering; survey sampling weights; weighted Hamming distance; |
Academic Unit: | Faculty of Science and Engineering > Mathematics and Statistics Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
Item ID: | 17954 |
Identification Number: | https://doi.org/10.1111/rssa.12712 |
Depositing User: | Keefe Murphy |
Date Deposited: | 14 Dec 2023 13:31 |
Journal or Publication Title: | Journal of the Royal Statistical Society Series A: Statistics in Society |
Publisher: | Oxford Academic |
Refereed: | Yes |
URI: | |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only(login required)
Item control page |
Downloads
Downloads per month over past year