A cosine similarity-based labeling technique for vulnerability type detection using source codes

Öztürk, M. Maruf

doi:10.1016/j.cose.2024.104059

Computers & Security

2024

DOI: 10.1016/j.cose.2024.104059

|View full text |Cite

A cosine similarity-based labeling technique for vulnerability type detection using source codes

M. Maruf Öztürk

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts

Zengeya,

Fonou Dombeu,

Gwetu

2024

BDCC

View full text Add to dashboard Cite

Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding text without considering how relevant or central a token is to the overall document content. There has also been research on the application of sequence labeling on contextualized embeddings; however, the existing methods often rely solely on local context for extracting key phrases from texts. To address these limitations, this study proposes a centrality-weighted BERT model for key phrase extraction from text using sequence labelling (CenBERT-SEQ). The proposed CenBERT-SEQ model utilizes BERT to represent terms with various contextual embedding architectures, and introduces a centrality-weighting layer that integrates document-level context into BERT. This layer leverages document embeddings to influence the importance of each term based on its relevance to the entire document. Finally, a linear classifier layer is employed to model the dependencies between the outputs, thereby enhancing the accuracy of the CenBERT-SEQ model. The proposed CenBERT-SEQ model was evaluated against the standard BERT base-uncased model using three Computer Science article datasets, namely, SemEval-2010, WWW, and KDD. The experimental results show that, although the CenBERT-SEQ and BERT-base models achieved higher and close comparable accuracy, the proposed CenBERT-SEQ model achieved higher precision, recall, and F1-score than the BERT-base model. Furthermore, a comparison of the proposed CenBERT-SEQ model to that of related studies revealed that the proposed CenBERT-SEQ model achieved a higher accuracy, precision, recall, and F1-score of 95%, 97%, 91%, and 94%, respectively, than related studies, showing the superior capabilities of the CenBERT-SEQ model in keyphrase extraction from scientific documents.

show abstract

A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts

Zengeya,

Fonou Dombeu,

Gwetu

2024

BDCC

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A cosine similarity-based labeling technique for vulnerability type detection using source codes

Cited by 1 publication

References 49 publications

A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts

A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts

Contact Info

Product

Resources

About