MURAL - Maynooth University Research Archive Library



    The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs


    Gaillat, Thomas and Zarrouk, Manel and Freitas, Andre and Davis, Brian (2018) The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs. In: LREC 2018, Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association, pp. 2671-2675. ISBN 9791095546009

    [img]
    Preview
    Download (146kB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    This paper introduces the three SSIX corpora for sentiment analysis. These corpora address the need to provide annotated data for supervised learning methods. They focus on stock-market related messages extracted from two financial microblog platforms, i.e., StockTwits and Twitter. In total they include 2,886 messages with opinion targets. These messages are provided with polarity annotation set on a continuous scale by three or four experts in each language. The annotation information identifies the targets with a sentiment score. The annotation process includes manual annotation verified and consolidated by financial experts. The creation of the annotated corpora took into account principled sampling strategies as well as inter-annotator agreement before consolidation in order to maximize data quality.

    Item Type: Book Section
    Additional Information: The LREC 2018 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License https://creativecommons.org/licenses/by-nc/4.0/ We would like to thank all the people involved in the creation of the Gold Standard. This work is funded by the SSIX Horizon 2020 project (Grant agreement No 645425)
    Keywords: Sentiment Analysis; Opinion; Corpus; Finance; Stock-market; Microblogs; Polarity Annotation;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Item ID: 13418
    Depositing User: Brian Davis
    Date Deposited: 07 Oct 2020 15:09
    Publisher: European Language Resources Association
    Refereed: Yes
    Funders: European Union Horizon 2020 programme
    URI:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads