2021
DOI: 10.36227/techrxiv.16444611.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers

Abstract: <div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Deep learning-based KT, leveraging advancements like self-attention mechanisms and Transformer architectures, has introduced models such as SAKT and SAINT+ for higher performance through sequence prediction and attention to temporal learning dynamics [19,22,8]. BERT-based KT models, though innovative, have not surpassed state-of-the-art KT methods in handling long-sequence, large-scale datasets [11,25].…”
Section: Knowledge Tracingmentioning
confidence: 99%
See 3 more Smart Citations
“…Deep learning-based KT, leveraging advancements like self-attention mechanisms and Transformer architectures, has introduced models such as SAKT and SAINT+ for higher performance through sequence prediction and attention to temporal learning dynamics [19,22,8]. BERT-based KT models, though innovative, have not surpassed state-of-the-art KT methods in handling long-sequence, large-scale datasets [11,25].…”
Section: Knowledge Tracingmentioning
confidence: 99%
“…BERT's bidirectional training and large pre-training corpus have set new benchmarks in understanding natural language, with applications extending into image processing, recommendation systems, and music generation [7,9,23]. Despite their success, BERT variants in KT have not achieved superior performance on complex, long-sequence datasets [25,13,14,17,12,16].…”
Section: Transformer-based Model and Applicationmentioning
confidence: 99%
See 2 more Smart Citations