MURAL - Maynooth University Research Archive Library



    Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models


    Murphy, Keefe and Murphy, T. Brendan and Piccarreta, Raffaella and Gormley, I. Claire (2021) Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models. Journal of the Royal Statistical Society Series A: Statistics in Society, 184 (4). pp. 1414-1451. ISSN 0964-1998

    [img]
    Preview
    Download (1MB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.

    Item Type: Article
    Additional Information: Cite as: Keefe Murphy, T. Brendan Murphy, Raffaella Piccarreta, I. Claire Gormley, Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 184, Issue 4, October 2021, Pages 1414–1451, https://doi.org/10.1111/rssa.12712
    Keywords: exponential-distance models; gating covariates; life-course sequences; model-based clustering; survey sampling weights; weighted Hamming distance;
    Academic Unit: Faculty of Science and Engineering > Mathematics and Statistics
    Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 17954
    Identification Number: https://doi.org/10.1111/rssa.12712
    Depositing User: Keefe Murphy
    Date Deposited: 14 Dec 2023 13:31
    Journal or Publication Title: Journal of the Royal Statistical Society Series A: Statistics in Society
    Publisher: Oxford Academic
    Refereed: Yes
    URI:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads