Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering

Vélez de Mendizabal, Iñaki; Vidriales, Xabier; Basto-Fernandes, Vitor; Ezpeleta, Enaitz; Méndez, José Ramón; Zurutuza, Urko

doi:10.9781/IJIMAI.2023.07.003

Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering

Vélez de Mendizabal, Iñaki ¹
Vidriales, Xabier ¹
Basto-Fernandes, Vitor ²
Ezpeleta, Enaitz ¹
Méndez, José Ramón ³⁴⁵
Zurutuza, Urko ¹

1 Universidad de Mondragón/Mondragon Unibertsitatea

Universidad de Mondragón/Mondragon Unibertsitatea

Mondragón, España

ROR https://ror.org/00wvqgd19
2 Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL
3 Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Universidade de Vigo
4 CINBIO - Biomedical Research Centre, Universidade de Vigo
5 SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), Hospital ÁlvaroCunqueiro Bloque técnico

Mostra les afiliacions +

Revista:

International Journal of Interactive Multimedia and Artificial Intelligence

ISSN: 1989-1660

Any de publicació: 2023

Volum: 8

Número: 4

Pàgines: 46-55

Tipus: Article

DOI: 10.9781/IJIMAI.2023.07.003 GOOGLE SCHOLAR Accés obert editor

Altres publicacions en: International Journal of Interactive Multimedia and Artificial Intelligence

Resum

The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order todistribute content over networks. The distribution of content encoded in images or the use of Leetspeak areconcrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealingwith these problems, the number of studies to solve them is quite small, and the reported performance is verylimited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposesa new technique based on using neural networks for decoding purposes. In addition, we distribute an imagedatabase specifically created for training Leetspeak decoding models. We have also created and made availablefour different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, wehave experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained haveshown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences.

€ Mostra el finançament

Informació de finançament

Vitor Basto Fernandes acknowledges FCT – Fundação para a Ciência e a Tecnologia, I.P., for its support in the context of project UIDB/04466/2020 and UIDP/04466/2020.

Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering

Universidad de Mondragón/Mondragon Unibertsitatea

Resum

Informació de finançament