Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering

  1. Vélez de Mendizabal, Iñaki 1
  2. Vidriales, Xabier 1
  3. Basto-Fernandes, Vitor 2
  4. Ezpeleta, Enaitz 1
  5. Méndez, José Ramón 345
  6. Zurutuza, Urko 1
  1. 1 Universidad de Mondragón/Mondragon Unibertsitatea
    info

    Universidad de Mondragón/Mondragon Unibertsitatea

    Mondragón, España

    ROR https://ror.org/00wvqgd19

  2. 2 Instituto Universitário de Lisboa (ISCTE-IUL), University Institute of Lisbon, ISTAR-IUL
  3. 3 Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Universidade de Vigo
  4. 4 CINBIO - Biomedical Research Centre, Universidade de Vigo
  5. 5 SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), Hospital ÁlvaroCunqueiro Bloque técnico
Revista:
International Journal of Interactive Multimedia and Artificial Intelligence

ISSN: 1989-1660

Any de publicació: 2023

Volum: 8

Número: 4

Pàgines: 46-55

Tipus: Article

DOI: 10.9781/IJIMAI.2023.07.003 GOOGLE SCHOLAR lock_openAccés obert editor

Altres publicacions en: International Journal of Interactive Multimedia and Artificial Intelligence

Resum

The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order todistribute content over networks. The distribution of content encoded in images or the use of Leetspeak areconcrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealingwith these problems, the number of studies to solve them is quite small, and the reported performance is verylimited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposesa new technique based on using neural networks for decoding purposes. In addition, we distribute an imagedatabase specifically created for training Leetspeak decoding models. We have also created and made availablefour different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, wehave experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained haveshown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences.

Informació de finançament

Vitor Basto Fernandes acknowledges FCT – Fundação para a Ciência e a Tecnologia, I.P., for its support in the context of project UIDB/04466/2020 and UIDP/04466/2020.