MURAL - Maynooth University Research Archive Library



    SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages


    Barzegar, Siamak and Davis, Brian and Zarrouk, Manel and Handschuh, Siegfried and Freitas, Andre (2018) SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages. In: LREC 2018, Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association, pp. 3912-3916. ISBN 9791095546009

    [img]
    Preview
    Download (528kB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German, French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). Semantic similarity and relatedness gold standards have been initially used to support the evaluation of semantic distance measures in the context of linguistic and knowledge resources and distributional semantic models. SemR-11 builds upon the English gold-standards of Miller & Charles (MC), Rubenstein & Goodenough (RG), WordSimilarity 353 (WS-353), and Simlex-999, providing a canonical translation for them. The final dataset consists of 15,917 word pairs and can be used to support the construction and evaluation of semantic similarity/relatedness and distributional semantic models. As a case study, the SemR-11 test collections was used to investigate how different distributional semantic models built from corpora in different languages and with different sizes perform in computing semantic relatedness similarity and relatedness tasks.

    Item Type: Book Section
    Additional Information: This publication has emanated from research funded in part from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 645425 SSIX and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289. The LREC 2018 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License https://creativecommons.org/licenses/by-nc/4.0/
    Keywords: Gold standard; Semantic Similarity; Semantic Relatedness; Multi-linguality; Word-embeddings;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Item ID: 13416
    Depositing User: Brian Davis
    Date Deposited: 07 Oct 2020 14:48
    Publisher: European Language Resources Association
    Refereed: Yes
    Funders: European Union Horizon 2020 programme, Science Foundation Ireland (SFI)
    URI:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads