MURAL - Maynooth University Research Archive Library



    Clustering Single-Cell Electropherograms by Genotype Through Unsupervised Machine Learning


    O'Donnell, Leah (2021) Clustering Single-Cell Electropherograms by Genotype Through Unsupervised Machine Learning. Masters thesis, National University of Ireland Maynooth.

    Abstract

    Cells can be linked to the person who produced them by examining the information contained within their DNA. The challenge that a forensic analyst faces is to question whether a collection of cells obtained from a crime scene supports the hypothesis that a person of interest was present. The primary challenge is that cell samples collected at crime scenes typically contain material from an unknown number of genetic sources in an unknown mixture ratio. The standard genetic measurement protocol used in crime labs produces a single, combined signal for the entire collection of cells. If there are a small number of contributors, cells are in good condition, and the mixture ratio is not overly imbalanced, armed with this measurement, informative inference is possible for a trier of fact. If, however, the sample is complex, containing more than three genetic sources, or if the mixture ratio is highly imbalanced, or if genetic information within cells is degraded, the ability to confidently extract meaning from the measured signal is impaired. In high profile work published in the late 1990s it was demonstrated that genotype information could be extracted from individual cells. When used in a forensics context, single-cell methods offer a potential solution to the complex mixture problem by providing genetic information per-cell rather than solely for the whole collection. Advances in those mea- surement methods mean that single cell technologies may soon be practicable in crime labs. Significant challenges on the interpretation of the signals that result, however, re- main. Instead of having a single high dimensional signal to assess, the trier of fact now has one for each cell. In the present thesis we take one step towards enabling the res- olution of the complex mixture problem by proposing and assessing two methodologies that would facilitate the downstream analysis of genetic signal from a collection of single cells. Our goal is to query whether it is possible to use unsupervised machine learning to accurately and efficiently gather single cell signals into groups by genotype. If possible, it would greatly reduce the computational complexity of the evaluation of evidence and improve its accuracy. The results in this thesis suggest that this approach is viable and advances the potential of this societally important technology.
    Item Type: Thesis (Masters)
    Keywords: Clustering Single-Cell Electropherograms; Genotype; Unsupervised Machine Learning;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 14919
    Depositing User: IR eTheses
    Date Deposited: 13 Oct 2021 11:20
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Downloads

    Downloads per month over past year

    Origin of downloads

    Repository Staff Only (login required)

    Item control page
    Item control page