MURAL - Maynooth University Research Archive Library



    A synthetic data-driven machine learning approach for athlete performance attenuation prediction


    Cordeiro, Mauricio C., Cathain, Ciaran O., Daly, Lorcan, Kelly, David T. and Rodrigues, Thiago B. (2025) A synthetic data-driven machine learning approach for athlete performance attenuation prediction. Frontiers in Sports and Active Living, 7. ISSN 2624-9367

    Abstract

    Athlete performance monitoring is effective for optimizing training strategies and preventing injuries. However, applying machine learning (ML) frameworks to this domain remains challenging due to data scarcity limitations. This study extends previous research by evaluating Tabular Variational Autoencoders (TVAE) for generating synthetic data to predict performance attenuation in Gaelic football athletes. Methods This study assesses synthetic data quality through a comprehensive evaluation framework combining column shape similarity metrics and Hellinger distance analysis, quantifying distributional fidelity across individual variables. Our ML implementation follows a two-phase approach. In the first phase, we evaluated models trained on hybrid datasets with varying synthetic proportions (10%–100%). In the second phase, we examined models trained exclusively on synthetic data and tested them on real data to analyze the utility of the synthetic data. Results Our results demonstrate that TVAE-generated synthetic data closely replicates original distribution patterns, achieving 85.53% column shape similarity and a Hellinger distance of 0.169. Models trained with additional synthetic data or exclusively on synthetic data outperformed real-data baselines across multiple metrics, particularly for neuromuscular parameters. Our findings emphasize that this approach increased data availability and improved model performance in specific scenarios. Discussion Several limitations remain: (1) there is limited framework transferability to sports with different physiological demands; (2) the Synthetic Data Generation (SDG) does not currently enforce feature constraints, and future implementations must ensure the procedure respects domain-specific feature limits; and (3) TVAE faced data fidelity challenges with certain variables, such as VO 2max . These findings demonstrate the utility of synthetic data for predicting performance attenuation in Gaelic Football athletes. They address the challenge of data scarcity and highlight how synthetic data can be effectively integrated across physiological, neuromuscular, and perceptual metrics in athlete monitoring. This opens new possibilities for exploring similar classification tasks in sports performance analysis.
    Item Type: Article
    Keywords: synthetic data; performance prediction; machine learning; tabular variational autoencoders; athlete monitoring;
    Academic Unit: Faculty of Science and Engineering > Sports Science and Nutrition
    Item ID: 21362
    Identification Number: 10.3389/fspor.2025.1607600
    Depositing User: IR Editor
    Date Deposited: 30 Mar 2026 11:50
    Journal or Publication Title: Frontiers in Sports and Active Living
    Publisher: Frontiers Media
    Refereed: Yes
    Related URLs:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Downloads

    Downloads per month over past year

    Origin of downloads

    Altmetric Badge

    Repository Staff Only (login required)

    Item control page
    Item control page