2019
DOI: 10.1007/978-3-030-31372-2_21
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Identification of Economic Activities in Complaints

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…As such, the test set used in these experiments has been drawn from the last 5 years of data only. This also ensures the results for the task of economic activity prediction reported in this paper can be compared to results from ear-lier work (Barbosa et al, 2019), which were obtained using the same test set. A total of roughly 25,000 examples make up this test set, 16% of all available data.…”
Section: Methodsmentioning
confidence: 81%
See 2 more Smart Citations
“…As such, the test set used in these experiments has been drawn from the last 5 years of data only. This also ensures the results for the task of economic activity prediction reported in this paper can be compared to results from ear-lier work (Barbosa et al, 2019), which were obtained using the same test set. A total of roughly 25,000 examples make up this test set, 16% of all available data.…”
Section: Methodsmentioning
confidence: 81%
“…Based on an earlier work that tackled economic activity prediction on a smaller sample of this dataset (Barbosa et al, 2019), the dataset was preprocessed using the Natural Language Toolkit (NLTK) (Bird et al, 2009) to perform tokenization, lemmatization and remove stop words from Portuguese text. Furthermore, from among the different featurebased representations explored by Barbosa et al (2019), a TF-IDF weighted vector was found to be the most effective method of representing each document. TF-IDF outperformed fastText-based (Joulin et al, 2016) and BERT-based (Devlin et al, 2018) representations, using traditional machine learning approaches, specifically Support Vector Machines (SVM) (Cortes and Vapnik, 1995).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to implement machine learning classifiers based on the textual contents of each complaint, and given their user-generated content nature, a previous preprocessing step was necessary. Based on an earlier work that tackled economic activity prediction on a smaller sample of this dataset (Barbosa et al, 2019), the dataset was preprocessed using the Natural Language Toolkit (NLTK) (Bird et al, 2009) to perform tokenization, lemmatization and remove stop words from Portuguese text. Furthermore, from among the different featurebased representations explored by Barbosa et al (2019), a TF-IDF weighted vector was found to be the most effective method of representing each document.…”
Section: Methodsmentioning
confidence: 99%
“…Based on an earlier work that tackled economic activity prediction on a smaller sample of this dataset (Barbosa et al, 2019), the dataset was preprocessed using the Natural Language Toolkit (NLTK) (Bird et al, 2009) to perform tokenization, lemmatization and remove stop words from Portuguese text. Furthermore, from among the different featurebased representations explored by Barbosa et al (2019), a TF-IDF weighted vector was found to be the most effective method of representing each document. TF-IDF outperformed fastText-based and BERT-based representations, using traditional machine learning approaches, specifically Support Vector Machines (SVM) (Cortes and Vapnik, 1995).…”
Section: Methodsmentioning
confidence: 99%