Using Personality Recognition Techniques to Improve Bayesian Spam Filtering

  1. Zurutuza, Urko
  2. Gómez Hidalgo, José María
  3. Ezpeleta, Enaitz
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2016

Issue: 57

Pages: 125-132

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

Millions of users per day are affected by unsolicited email campaigns. During the last years several techniques to detect spam have been developed, achieving specially good results using machine learning algorithms. In this work we provide a baseline for a new spam filtering method. Carrying out this research we validate our hypothesis that personality recognition techniques can help in Bayesian spam filtering. We add the personality feature to each email using personality recognition techniques, and then we compare Bayesian spam filters with and without personality in terms of accuracy. In a second experiment we combine personality and polarity features of each message and we compare all the results. At the end, the top ten Bayesian filtering classifiers have been improved, reaching to a 99.24% of accuracy, reducing also the false positive number.

Bibliographic References

  • Bai, S., T. Zhu, and L. Cheng. 2012. Bigfive personality prediction based on user behaviors at social network sites. CoRR, abs/1204.4809.
  • Briggs Myers, I. and P. B. Myers. 1980. Gifts differing: Understanding personality type.
  • Celli, F. and M. Poesio. 2014. Pr2: A language independent unsupervised tool for personality recognition from text. arXiv preprint arXiv:1402.2796.
  • Cormack, G. V. 2007. Email spam filtering: A systematic review. Foundations and Trends in Information Retrieval, 1(4):335–455.
  • Costa, P. T. and R. R. McCrae. 1992. Normal personality assessment in clinical practice: The neo personality inventory. Psychological assessment, 4(1):5.
  • Eberhardt, J. J. 2015. Bayesian spam detection. Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal.
  • Echeverria Briones, P. F., Z. V. Altamirano Valarezo, A. B. Pinto Astudillo, and J. D. C. Sanchez Guerrero. 2009. Text mining aplicado a la clasificación y distribución automática de correo electrónico y detección de correo spam.
  • Ezpeleta, E., U. Zurutuza, and J. M. Gómez Hidalgo. 2016a. Does sentiment analysis help in bayesian spam filtering? In Hybrid Artificial Intelligent Systems: 11th International Conference, HAIS 2016, Sevilla, Spain, April 18-20, 2016. Springer.
  • Ezpeleta, E., U. Zurutuza, and J. M. Gómez Hidalgo. 2016b. Short messages spam filtering using personality recognition. In Proceedings of the 4th Spanish Conference in Information Retrieval.
  • Giyanani, R. and M. Desai. 2013. Spam detection using natural language processing. International Journal of Computer Science Research & Technilogy, 1:55–58, August.
  • Jensen, G. H. and J. K. DiTiberio. 1989. Personality and the Teaching of Composition, volume 20. Ablex Pub.
  • Lau, R. Y. K., S. Y. Liao, R. C.-W. Kwok, K. Xu, Y. Xia, and Y. Li. 2012. Text mining and probabilistic language modeling for online review spam detection. ACM Trans. Manage. Inf. Syst., 2(4):25:1–25:30, January.
  • Liddy, E. 2001. Natural language processing. Encyclopedia of Library and Information Science, 2nd Ed., NY. Marcel Decker, Inc.
  • Liu, B. and L. Zhang. 2012. A survey of opinion mining and sentiment analysis. Mining Text Data, pages 415–463.
  • Mairesse, F., M. A. Walker, M. R. Mehl, and R. K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Int. Res., 30(1):457–500, November.
  • Malarvizhi, R. and K. Saraswathi. 2013. Content-based spam filtering and detection algorithms-an efficient analysis & comparison 1. International Journal of Engineering Trends and Technology, Vol. 4, Issue 9, September.
  • Nazirova, S. 2011. Survey on spam filtering techniques. Communications and Network, 3(3):153–160.
  • Oberlander, J. and S. Nowson. 2006. Whose thumb is it anyway?: Classifying author personality from weblog text. In Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL ’06, pages 627–634, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Rangel, F., F. Celli, P. Rosso, M. Potthast, B. Stein, and W. Daelemans. 2015. Overview of the 3rd Author Profiling Task at PAN 2015. In Working Notes Papers of the CLEF 2015 Evaluation Labs, CEUR Workshop Proceedings. CLEF and CEUR-WS.org, September.
  • Sanz, E. P., J. M. G. Hidalgo, and J. C. Cortizo. 2008. Email spam filtering. Advances in Computers, pages 45–114.
  • Savita Teli, S. B. 2014. Effective spam detection method for email. IOSR Journal of Computer Science, pages 68–72.
  • Shen, J., O. Brdiczka, and J. Liu. 2013. Understanding email writers: Personality prediction from email messages. In User Modeling, Adaptation, and Personalization. Springer, pages 318–330.
  • Tretyakov, K. 2004. Machine learning techniques in spam filtering. In Data Mining Problem-oriented Seminar, MTAT, volume 3, pages 60–79.
  • Vinciarelli, A. and G. Mohammadi. 2014. A survey of personality computing. Affective Computing, IEEE Transactions on, 5(3):273–291.