2020
DOI: 10.3390/app10072303
|View full text |Cite
|
Sign up to set email alerts
|

Named Entity Recognition for Sensitive Data Discovery in Portuguese

Abstract: The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but, in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 27 publications
(16 citation statements)
references
References 26 publications
0
16
0
Order By: Relevance
“…They also released an annotated data set to help researchers make studies to improve their results. In 2020, Mariana et al [20] employed a hybrid approach that consists of rule sets, lexical based models, and machine learning algorithms to detect and make notifications about personal data for Portuguese language. This study extracts only 11 types of personal data entities in 3 categories.…”
Section: Personal Data Discoverymentioning
confidence: 99%
“…They also released an annotated data set to help researchers make studies to improve their results. In 2020, Mariana et al [20] employed a hybrid approach that consists of rule sets, lexical based models, and machine learning algorithms to detect and make notifications about personal data for Portuguese language. This study extracts only 11 types of personal data entities in 3 categories.…”
Section: Personal Data Discoverymentioning
confidence: 99%
“…However, more investigations are required to verify these hypotheses, and the effects of NLP models to recognise entities of different classes deserves more research. A possible research path involves the combination of CRFs with BiLSTMs, which can achieve state-ofthe-art performance for specific language resources [10] and be potentially useful in the identification of entities of the ENAMEX group, e.g., [7,31]. In addition, methodological approaches involving latent factor analysis may also contribute to improve computational efficiency and prediction accuracy of trained models for unseen data [32][33][34][35][36] .…”
Section: Influence Of Self-training On Model Performancementioning
confidence: 99%
“…BiLSTM is a new network structure improved on the basis of LSTM [30][31][32]. It consists of the following four layers: input layer, forward LSTM layer, backward LSTM layer and the output layer.…”
Section: Extracting Semantic Information Based On Bilstmmentioning
confidence: 99%