Leigh, R.J. and Murphy, R.A. and Walsh, F. (2021) uniForest: an unsupervised machine learning technique to detect outliers and restrict variance in microbiome studies. Cold Spring Harbor perspectives in medicine. ISSN 2157-1422
|
Download (1MB)
| Preview
|
Abstract
Isolation Forests is an unsupervised machine learning technique for detecting outliers in continuous datasets that does not require an underlying equivariant or Gaussian distribution and is suitable for use on small datasets. While this procedure is widely used across quantitative fields, to our knowledge, this is the first attempt to solely assess its use for microbiome datasets. Here we present uniForest, an interactive Python notebook (which can be run from any desktop computer using the Google Colaboratory web service) for the processing of microbiome outliers. We used uniForest to apply Isolation Forests to the Healthy Human Microbiome project dataset and imputed outliers with the mean of the remaining inliers to maintain sample size and assessed its prowess in variance reduction in both community structure and derived ecological statistics (-diversity). We also assessed its functionality in anatomical site made available under aCC-BY 4.0 International license. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444491; this version posted May 17, 2021. The copyright holder for this preprint 2 differentiation (pre- and postprocessing) using principal component analysis, dissimilarity matrices, and ANOSIM. We observed a minimum variance reduction of 81.17% across the entire dataset and in alpha diversity at the Phylum level. Application of Isolation Forests also separated the dataset to an extremely high specificity, reducing variance within taxa samples by a minimum of 81.33%. It is evident that Isolation Forests are a potent tool in restricting the effect of variance in microbiome analysis and has potential for broad application in studies where high levels of microbiome variance is expected. This software allows for clean analyses of otherwise noisy datasets.
Item Type: | Article |
---|---|
Additional Information: | Cite as: uniForest: an unsupervised machine learning technique to detect outliers and restrict variance in microbiome studies R.J. Leigh, R.A. Murphy, F. Walsh bioRxiv 2021.05.17.444491; doi: https://doi.org/10.1101/2021.05.17.444491 |
Keywords: | uniForest; outliers |
Academic Unit: | Faculty of Science and Engineering > Biology Faculty of Science and Engineering > Research Institutes > Human Health Institute |
Item ID: | 17327 |
Identification Number: | :10.1101/2021.05.17.444491 |
Depositing User: | Dr Robert Leigh |
Date Deposited: | 15 Jun 2023 12:44 |
Journal or Publication Title: | Cold Spring Harbor perspectives in medicine |
Publisher: | CSHL press |
Refereed: | Yes |
URI: | |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only(login required)
Item control page |
Downloads
Downloads per month over past year