2020
DOI: 10.1109/access.2020.2989126
|View full text |Cite
|
Sign up to set email alerts
|

From Feature Engineering and Topics Models to Enhanced Prediction Rates in Phishing Detection

Abstract: Phishing is a type of fraud attempt in which the attacker, usually by e-mail, pretends to be a trusted person or entity in order to obtain sensitive information from a target. Most recent phishing detection researches have focused on obtaining highly distinctive features from the metadata and text of these e-mails. The obtained attributes are then used to feed classification algorithms in order to determine whether they are phishing or legitimate messages. In this paper, it is proposed an approach based on mac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 43 publications
(37 citation statements)
references
References 36 publications
1
28
0
Order By: Relevance
“…Considering our previous proposal [47], the approach presented in this paper are results of an improvement over it, since it employs the new Stanza NLP toolkit as the database in the feature engineering process to promote tokenization, POS tagging, and lemmatization (instead of WordNet, as done in [47]); and proposes methods to select feature based on statistical measures, and methods to extract new features, from that initially presented in DTM, based on PCA and LSA (instead of LDA, as done in [47]).…”
Section: Related Workmentioning
confidence: 98%
See 4 more Smart Citations
“…Considering our previous proposal [47], the approach presented in this paper are results of an improvement over it, since it employs the new Stanza NLP toolkit as the database in the feature engineering process to promote tokenization, POS tagging, and lemmatization (instead of WordNet, as done in [47]); and proposes methods to select feature based on statistical measures, and methods to extract new features, from that initially presented in DTM, based on PCA and LSA (instead of LDA, as done in [47]).…”
Section: Related Workmentioning
confidence: 98%
“…This strategy for folding was not yet used for phishing detection approaches based on ML algorithms. In this context, our previous work presented in [47] and this approach were the first employment of this plan.…”
Section: J Features Attributesmentioning
confidence: 99%
See 3 more Smart Citations