14th Innovations in Software Engineering Conference (Formerly Known as India Software Engineering Conference) 2021
DOI: 10.1145/3452383.3452401
|View full text |Cite
|
Sign up to set email alerts
|

Crawling Wikipedia Pages to Train Word Embeddings Model for Software Engineering Domain

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…One of the retrieved papers (Mishra et al [139]) focused on building a language model for the software requirement domain. This model was based on a domain-specific text corpus collected by crawling the software engineering category on Wikipedia.…”
Section: ) Othersmentioning
confidence: 99%
See 2 more Smart Citations
“…One of the retrieved papers (Mishra et al [139]) focused on building a language model for the software requirement domain. This model was based on a domain-specific text corpus collected by crawling the software engineering category on Wikipedia.…”
Section: ) Othersmentioning
confidence: 99%
“…For example, the domain-specific meaning of words like "virus", "cookies", "Python", "fork", etc. can not be captured based on models trained on generic corpora [139]. Building such domainspecific word embedding models is notoriously a challenging task, especially with the little size of data available in the RE domain.…”
Section: ) More Re Domain-specific Word Embedding Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Building such domainspecific word embedding models is notoriously a challenging task, especially with the little size of data available in the RE domain. One of the recent works [139] provided an embedding model trained on 92 MB of texts collected from Wikipedia pages related to the software engineering domain. However, more research is needed (1) to have embedding models trained on more practical industrial texts, (2) and to evaluate the use of these models in various RE tasks.…”
Section: More Re Domain-specific Word Embedding Modelsmentioning
confidence: 99%