Kotey, Samantha, Dahyot, Rozenn and Harte, Naomi (2023) Fine Grained Spoken Document Summarization Through Text Segmentation. 2022 IEEE Spoken Language Technology Workshop (SLT). pp. 647-654.
Preview
RD_fine.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (198kB) | Preview
Abstract
Podcast transcripts are long spoken documents of conversational dialogue. Challenging to summarize, podcasts cover a diverse range of topics, vary in length, and have uniquely different linguistic styles. Previous studies in podcast summarization have generated short, concise dialogue summaries. In contrast, we propose a method to generate long fine-grained summaries, which describe details of sub-topic narratives. Leveraging a readability formula, we curate a data subset to train a long sequence transformer for abstractive summarization. Through text segmentation, we filter the evaluation data and exclude specific segments of text. We apply the model to segmented data, producing different types of fine grained summaries. We show that appropriate filtering creates comparable results on ROUGE and serves as an alternative method to truncation. Experiments show our model outperforms previous studies on the Spotify podcast dataset when tasked with generating longer sequences of text.
Item Type: | Article |
---|---|
Keywords: | spoken document summarization; text segmentation; long sequence transformers; readability formulas; podcast summarization; |
Academic Unit: | Faculty of Science and Engineering > Computer Science Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
Item ID: | 20545 |
Identification Number: | 10.1109/slt54892.2023.10022829 |
Depositing User: | Rozenn Dahyot |
Date Deposited: | 09 Sep 2025 10:18 |
Journal or Publication Title: | 2022 IEEE Spoken Language Technology Workshop (SLT) |
Publisher: | IEEE |
Refereed: | Yes |
Related URLs: | |
URI: | https://mural.maynoothuniversity.ie/id/eprint/20545 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year