Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text

Parnell, Andrew; González-Castro, Víctor; Alaiz-Rodríguez, Rocío; Barrientos, Gonzalo Molpeceres

Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text

Share and Export

Parnell, Andrew, González-Castro, Víctor, Alaiz-Rodríguez, Rocío and Barrientos, Gonzalo Molpeceres (2020) Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text. International Journal of Computational Intelligence Systems, 13 (1). p. 591. ISSN 1875-6883

Preview

Text
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (3MB) | Preview

Abstract

Nowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sexual content on text documents using Natural Language Processing (NLP) techniques. Following an approach based on Machine Learning techniques, we have assessed twelve models resulting from the combination of three text encoders (Bag of Words, Term Frequency-Inverse Document Frequency and Word2vec) together with four classifiers (Support Vector Machines (SVMs), Logistic Regression, k-Nearest Neighbours and Random Forests). We evaluated these alternatives on a new created dataset extracted from public data on the Reddit Website. The best performance result was achieved by the combination of the text encoder TF-IDF and the SVM classifier with linear kernel with an accuracy of 0.97 and F-score 0.96 (precision 0.96/recall 0.95). This study demonstrates that it is possible to detect erotic content on text documents and therefore, develop filters for minors or according to user's preferences.

Item Type:	Article
Additional Information:	Cite as: Hernández, A., Martin-Puertas, C., Moffa-Sánchez, P., Moreno-Chamarro, E., Ortega, P., Blockley, S., Cobb, K.M., Comas-Bru, L., Giralt, S., Goosse, H., Luterbacher, J., Martrat, B., Muscheler, R., Parnell, A., Pla-Rabes, S., Sjolte, J., Scaife, A.A., Swingedouw, D., Wise, E. & Xu, G. 2020, "Modes of climate variability: Synthesis and review of proxy-based reconstructions through the Holocene", Earth-science reviews, vol. 209, pp. 103286. Copyright: This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
Keywords:	Inappropriate content; Machine learning; Text classification Natural language processing; Text encoders;
Academic Unit:	Faculty of Science and Engineering > Mathematics and Statistics Faculty of Science and Engineering > Research Institutes > Hamilton Institute Faculty of Social Sciences > Research Institutes > Irish Climate Analysis and Research Units, ICARUS
Item ID:	16230
Identification Number:	10.2991/ijcis.d.200519.003
Depositing User:	Andrew Parnell
Date Deposited:	05 Jul 2022 14:20
Journal or Publication Title:	International Journal of Computational Intelligence Systems
Publisher:	Atlantis Press
Refereed:	Yes
Related URLs:	Publisher
Use Licence:	This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

MURAL - Maynooth University Research Archive Library

Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text

Abstract

Downloads

Origin of downloads

Altmetric Badge

Repository Staff Only (login required)