2011
DOI: 10.1016/j.eswa.2010.07.146
|View full text |Cite
|
Sign up to set email alerts
|

Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
30
0
3

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 64 publications
(33 citation statements)
references
References 31 publications
0
30
0
3
Order By: Relevance
“…The This problem present in many real world problems such as medical diagnosis [8], fraud detection, finances, risk management, network intrusion, E-mail foldering [12], Software Defect Detection [18] and so on. Additionally, the positive (minority) class is the class of interest from the learning point of view and has great impact when it is not classified properly.…”
Section: A Imbalanced Data Problemmentioning
confidence: 99%
“…The This problem present in many real world problems such as medical diagnosis [8], fraud detection, finances, risk management, network intrusion, E-mail foldering [12], Software Defect Detection [18] and so on. Additionally, the positive (minority) class is the class of interest from the learning point of view and has great impact when it is not classified properly.…”
Section: A Imbalanced Data Problemmentioning
confidence: 99%
“…Un documento se trata como una secuencia de palabras y se supone que cada posición de la palabra se genera independientemente de cualquier otra. Es un clasificador rápido, fácil de implementar y relativamente eficaz [11].El modelo Naïve Bayes Multinomial permite considerar la frecuencia de aparición de cada término en los documentos, esto es importante, ya que podemos suponer que una alta frecuencia de aparición aumenta la probabilidad de pertenecer a una clase particular [1].…”
Section: Naïve Bayes Multinomialunclassified
“…Las máquinas de soporte vectorial en Python cuentan con métodos de aprendizaje supervisado utilizados para la clasificación, regresión y detección de valores atípicos. La herramienta scikit-learn[3] de Python es una máquina de aprendizaje en Python, que proporciona simples y eficientes herramientas para la minería de datos y análisis de sentimientos, contiene varios vectorizers 1 de traducción para los documentos de entrada en vectores de características 1. Función utilizada en scikit-learn http://scikit-learn.org/…”
unclassified
“…Another common approach is to consider additional properties of email messages and utilize structural features, e.g., [2,15]. Results for these methods are commonly given for the Enron and SRI datasets [4,6,7,15].…”
Section: Related Workmentioning
confidence: 99%
“…The approaches include rule-based systems (e.g., [9,18]), IR methods: k-NN [6] and tf-idf, and machine learning methods: naive Bayes (NB), decision trees, support vector machines(SVM), maximum entropy (MaxEnt) and neural networks (NN) [1,4,10,14]. Another common approach is to consider additional properties of email messages and utilize structural features, e.g., [2,15].…”
Section: Related Workmentioning
confidence: 99%