Cheng, Long and Malik, Avinash and Kotoulas, Spyros and Ward, Tomas E. and Theodoropoulos, Georgios
(2014)
Scalable RDF Data Compression using X10.
Working Paper.
arXiv.
Abstract
The Semantic Web comprises enormous volumes
of semi-structured data elements. For interoperability, these
elements are represented by long strings. Such representations
are not efficient for the purposes of Semantic Web applications
that perform computations over large volumes of information.
A typical method for alleviating the impact of this problem is
through the use of compression methods that produce more
compact representations of the data. The use of dictionary
encoding for this purpose is particularly prevalent in Semantic
Web database systems. However, centralized implementations
present performance bottlenecks, giving rise to the need for
scalable, efficient distributed encoding schemes. In this paper,
we describe an encoding implementation based on the asynchronous
partitioned global address space (APGAS) parallel
programming model. We evaluate performance on a cluster of
up to 384 cores and datasets of up to 11 billion triples (1.9
TB). Compared to the state-of-art MapReduce algorithm, we
demonstrate a speedup of 2:6 - 7:4X and excellent scalability.
These results illustrate the strong potential of the APGAS
model for efficient implementation of dictionary encoding and
contributes to the engineering of larger scale Semantic Web
applications.
Item Type: |
Monograph
(Working Paper)
|
Keywords: |
RDF; Parallel compression; dictionary encoding;
X10; HPC; |
Academic Unit: |
Faculty of Science and Engineering > Electronic Engineering |
Item ID: |
6278 |
Identification Number: |
arXiv:1403.2404 |
Depositing User: |
Dr Tomas Ward
|
Date Deposited: |
21 Jul 2015 14:53 |
Publisher: |
arXiv |
URI: |
|
Use Licence: |
This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available
here |
Repository Staff Only(login required)
|
Item control page |
Downloads per month over past year
Origin of downloads