MURAL - Maynooth University Research Archive Library



    Rounding based continuous data discretization for statistical disclosure control


    Senavirathne, Navoda and Torra, Vicenç (2019) Rounding based continuous data discretization for statistical disclosure control. Journal of Ambient Intelligence and Humanized Computing. ISSN 1868-5137

    [img]
    Preview
    Download (1MB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    “Rounding” can be understood as a way to coarsen continuous data. That is, low level and infrequent values are replaced by high-level and more frequent representative values. This concept is explored as a method for data privacy with techniques like rounding, microaggregation, and generalisation. This concept is explored as a method for data privacy in statistical disclosure control literature with perturbative techniques like rounding, microaggregation and non-perturbative methods like generalisation. Even though “rounding” is well known as a numerical data protection method, it has not been studied in depth or evaluated empirically to the best of our knowledge. This work is motivated by three objectives, (1) to study the alternative methods of obtaining the rounding values to represent a given continuous variable, (2) to empirically evaluate rounding as a data protection technique based on information loss (IL) and disclosure risk (DR), and (3) to analyse the impact of data rounding on machine learning based models. Here, in order to obtain the rounding values we consider discretization methods introduced in the unsupervised machine learning literature along with microaggregation and re-sampling based approaches. The results indicate that microaggregation based techniques are preferred over unsupervised discretization methods due to their fair trade-off between IL and DR.

    Item Type: Article
    Additional Information: Cite as: Senavirathne, N., Torra, V. Rounding based continuous data discretization for statistical disclosure control. J Ambient Intell Human Comput (2019). https://doi.org/10.1007/s12652-019-01489-7
    Keywords: Rounding for micro data; Unsupervised discretization; Micro data protection;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 14065
    Identification Number: https://doi.org/10.1007/s12652-019-01489-7
    Depositing User: Vicenç Torra
    Date Deposited: 24 Feb 2021 15:16
    Journal or Publication Title: Journal of Ambient Intelligence and Humanized Computing
    Publisher: Springer
    Refereed: Yes
    URI:

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year