A Novel Approach: Tokenization Framework based on Sentence Structure in Indonesian Language

Petrus, Johannes; Ermatita, Ermatita; Sukemi, -; Erwin, -

doi:10.14569/ijacsa.2023.0140264

Cited by 2 publications

(1 citation statement)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tokenization was performed after the data normalization process. Tokenization is a crucial stage in RE that involves breaking sentences into word pieces, or tokens, for each line [13]. BERT Tokenizer was used to break a sentence into words (tokens).…”

Section: Preprocessingmentioning

confidence: 99%

Plant-Disease Relation Model through BERT-BiLSTM-CRF Approach

Riyanto,

Sitanggang,

Djatna

et al. 2024

IJEEI

View full text Add to dashboard Cite

Plant Disease Relations (PDR) is one of the Information Extraction (IE) subtasks that reveals the relationship between plant entities and diseases that appear together in a sentence. Previous studies have proposed methods for detecting the extraction of relationships between plant diseases (PDR). Previous research has proposed a Short Dependency Path-Convolutional Neural Network (SDP-CNN) method to predict relationships. However, the proposed method has limitations when faced with long and complex sentences. To overcome these limitations, this study proposes the BERT-BiLSTM-CRF method to improve the model performance in detecting PDR. First, the data is processed into the BERT Encoder layer after the tokenization process. After the BERT Encoder calculates the hidden information, the next step is to enter the linear layer to obtain word embedding. Calculation results in the bilinear layer are forwarded to the softmax layer to predict the relationship of each pair. Computation results in the softmax layer are sent to the BiLSTM layer. Finally, the CRF layer is entered to improve the prediction process. An 80:20 ratio for training and testing data was used to build the model using the same parameter values over ten attempts. GridSearch hyperparameter tuning is also involved in improving model performance. Experimental results show that the architecture proposed in this research can increase the F1 score by 0.790, which proved to be higher than SDP-CNN with a micro F1 score of 0.764. The problem of predicting PDR was overcome by the BERT-BILSTM-CRF method. The issue of forecasting PDR was resolved using the BERT-BILSTM-CRF approach.

show abstract

Section: Preprocessingmentioning

confidence: 99%