MURAL - Maynooth University Research Archive Library



    Automatic Speech Recognition Models for Pathological Speech: Challenges and Insights


    Mokgosi, Kesego, Ennis, Cathy and Ross, Robert (2024) Automatic Speech Recognition Models for Pathological Speech: Challenges and Insights. AICS2024:32nd Irish Conference on Artificial Intelligence and Cognitive Science, 437. pp. 1-11.

    [thumbnail of AutomaticSpeechRecognitionModelsforPathologicalSpeech_Chal.pdf]
    Preview
    Text
    AutomaticSpeechRecognitionModelsforPathologicalSpeech_Chal.pdf
    Available under License Creative Commons Attribution Non-commercial Share Alike.

    Download (723kB) | Preview

    Abstract

    Conversational avatars provide innovative platforms for enhancing therapist-patient interactions in speech therapy by offering real-time feedback. However, the performance of Automatic Speech Recognition (ASR) models on disordered speech, such as dysarthria and stuttering, remains underexplored. The effectiveness of these systems hinges on the accuracy and processing speed of ASR models when transcribing pathological speech, particularly in real-time scenarios. This study evaluates several pre-trained ASR models, including Whisper-largev3-turbo, Canary, DistilWhisper, and NVIDIA’s stt-en-fastconformer-ctc-large across three datasets: Common Voice (standard speech), TORGO (dysarthric speech), and UCLASS (stuttered speech). We assess the models using Word Error Rate (WER), Real-Time Factor (RTF), and BERTScore to measure transcription accuracy, computational efficiency, and semantic congruence. The stt-en-fastconformer-ctc-large model demonstrates the fastest processing speeds, achieving the lowest WER and highest BERTScores on both the Common Voice and TORGO datasets, making it highly suitable for real-time therapeutic applications. However, all models struggle with accurately transcribing stuttered speech from the UCLASS dataset. These results highlight the need for ASR improvements for disordered speech, focusing on edge deployment to reduce latency and enhance accuracy with multimodal inputs.
    Item Type: Article
    Keywords: Automatic Speech Recognition; Disordered Speech; Conversational Avatars; Speech Therapy;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 20318
    Identification Number: 10.21427/5r8h-px25
    Depositing User: IR Editor
    Date Deposited: 12 Aug 2025 10:16
    Journal or Publication Title: AICS2024:32nd Irish Conference on Artificial Intelligence and Cognitive Science
    Publisher: ARROW@TU Dublin
    Refereed: Yes
    Related URLs:
    URI: https://mural.maynoothuniversity.ie/id/eprint/20318
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only (login required)

    Item control page
    Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads