Wangrungarun, Phattara
(2015)
Web service for 19th century
Irish personal name matching.
Masters thesis, National University of Ireland Maynooth.
Abstract
Before the first Irish civil registration on 1864, census materials were
mostly lost or incomplete. So genealogical research uses parish records
and also some ‘census substitute’ documents, such as land ownership
and tenancy records. However, some of these documents may not
contain enough information in identify individuals. Some of them
contains a name and address, whereas others might contain only a
name.
Record linkage is one method to gather scattered information among
many documents. It uses a person's name as a reference to link that
person's information between many documents.With patience, a more
complete information about that person can be obtained.
Therefore linking or matching a person's name is important in the
process. Unfortunately, in the 19th century, in Ireland, there was no
standard spelling of names, handwriting could be difficult to read
and contractions or abbreviations were often used. The names with
the same pronunciation and for the same individual could be written
in many different ways. Moreover, names in the Irish language which
are equivalent to English names were used, for example, Irish version
of ‘Smith’ could be ‘Gowan’. A further complication is that historical
and genealogical research often requires large quantities of names to
be matched.
To handle these name variations, various solutions have been created
to find matching different names that refer to the same person.
However, for our extent knowledge, there is yet no public system
which encodes those solutions together and provides a service of
bulk name matching. Thus, we developed a web service system using
Ruby on Rails framework to achieve our goal. The system is initially
encoded with 4 matching algorithms, Levenshtein distance, soundex,
Irish soundex, and lookup table. We also present a web interface for
a client to use the system from the web browser. It is designed to be
simple and extensible from using inheritance.
The system performs matchings on large quantities of names in
a reasonable time. We test our system with 12,944 name matchings
and the result were completed in no more than half a minute (28,786
milliseconds, to be precise). However, the system consumes a large
amount of memory (around 373 megabytes). We believe that, with
proper optimisation, we would reduce the memory usage along with
a shortened processing time. Further matching algorithms could also
be implemented for names in other languages, so that it can handle a
broader domain of names.
Item Type: |
Thesis
(Masters)
|
Additional Information: |
Taught Masters Thesis for the Erasmus Mundus MSc in Dependable Software Systems |
Keywords: |
Web service; 19th century; Irish personal name matching; |
Academic Unit: |
Faculty of Science and Engineering > Computer Science |
Item ID: |
7092 |
Depositing User: |
IR eTheses
|
Date Deposited: |
04 May 2016 11:18 |
URI: |
|
Use Licence: |
This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available
here |
Repository Staff Only(login required)
|
Item control page |
Downloads per month over past year
Origin of downloads