Hasan, Souleiman and Curry, Edward (2017) Word Re-Embedding via Manifold Dimensionality Retention. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, pp. 321-326. ISBN 978-1-945626-83-8
Preview
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (150kB) | Preview
Abstract
Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words co-occurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 - 5.0% points depending on the original space.
| Item Type: | Book Section |
|---|---|
| Additional Information: | This paper was presented at EMNLP 2017 The Conference on Empirical Methods in Natural Language Processing, September 9-11, 2017 Copenhagen, Denmark. |
| Keywords: | Embeddings; Set theory; Topology; Vector spaces; Co-occurrence; Euclidean metrics; Manifold learning; On state; Word similarity; Natural language processing systems; |
| Academic Unit: | Faculty of Science and Engineering > Research Institutes > Hamilton Institute Faculty of Social Sciences > School of Business |
| Item ID: | 11995 |
| Identification Number: | 10.18653/v1/D17-1033 |
| Depositing User: | Souleiman Hasan |
| Date Deposited: | 05 Dec 2019 14:23 |
| Publisher: | Association for Computational Linguistics (ACL) |
| Refereed: | Yes |
| Related URLs: | |
| Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Downloads
Downloads per month over past year
Share and Export
Share and Export