Barzegar, Siamak and Davis, Brian and Zarrouk, Manel and Handschuh, Siegfried and Freitas, Andre
(2018)
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages.
In:
LREC 2018, Eleventh International Conference on Language Resources and Evaluation.
European Language Resources Association, pp. 3912-3916.
ISBN 9791095546009
Abstract
This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German,
French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). Semantic similarity and relatedness gold
standards have been initially used to support the evaluation of semantic distance measures in the context of linguistic and knowledge
resources and distributional semantic models. SemR-11 builds upon the English gold-standards of Miller & Charles (MC), Rubenstein &
Goodenough (RG), WordSimilarity 353 (WS-353), and Simlex-999, providing a canonical translation for them. The final dataset consists
of 15,917 word pairs and can be used to support the construction and evaluation of semantic similarity/relatedness and distributional
semantic models. As a case study, the SemR-11 test collections was used to investigate how different distributional semantic models
built from corpora in different languages and with different sizes perform in computing semantic relatedness similarity and relatedness
tasks.
Item Type: |
Book Section
|
Additional Information: |
This publication has emanated from research funded in part from the European
Union’s Horizon 2020 research and innovation programme under grant agreement
No 645425 SSIX and Science Foundation
Ireland (SFI) under Grant Number SFI/12/RC/2289.
The LREC 2018 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License https://creativecommons.org/licenses/by-nc/4.0/ |
Keywords: |
Gold standard; Semantic Similarity; Semantic Relatedness; Multi-linguality; Word-embeddings; |
Academic Unit: |
Faculty of Science and Engineering > Computer Science |
Item ID: |
13416 |
Depositing User: |
Brian Davis
|
Date Deposited: |
07 Oct 2020 14:48 |
Publisher: |
European Language Resources Association |
Refereed: |
Yes |
Funders: |
European Union Horizon 2020 programme, Science Foundation Ireland (SFI) |
URI: |
|
Use Licence: |
This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available
here |
Repository Staff Only(login required)
|
Item control page |
Downloads per month over past year
Origin of downloads