Jeniya Tabassum scite author profile

There is an increasing interest in studying natural language and computer code together, as large corpora of programming texts become readily available on the Internet. For example, StackOverflow currently has over 15 million programming related questions written by 8.5 million users. Meanwhile, there is still a lack of fundamental NLP techniques for identifying code tokens or software-related named entities that appear within natural language sentences. In this paper, we introduce a new named entity recognition (NER) corpus for the computer programming domain, consisting of 15,372 sentences annotated with 20 fine-grained entity types. We trained indomain BERT representations (BERTOverflow) on 152 million sentences from Stack-Overflow, which lead to an absolute increase of +10 F 1 score over off-the-shelf BERT. We also present the SoftNER model which achieves an overall 79.10 F 1 score for code and named entity recognition on StackOverflow data. Our SoftNER model incorporates a context-independent code token classifier with corpus-level features to improve the BERTbased tagging model. 1

show abstract

WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols

Tabassum

Ritter

2020

View full text Add to dashboard Cite

show abstract

Code and Named Entity Recognition in StackOverflow

Tabassum¹,

Maddela²,

Wang³

et al. 2020

Preprint

View full text Add to dashboard Cite

TweeTime : A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter

Tabassum¹,

Ritter²,

Xu³

2016

View full text Add to dashboard Cite

We describe TweeTIME, a temporal tagger for recognizing and normalizing time expressions in Twitter. Most previous work in social media analysis has to rely on temporal resolvers that are designed for well-edited text, and therefore suffer from reduced performance due to domain mismatch. We present a minimally supervised method that learns from large quantities of unlabeled data and requires no hand-engineered rules or hand-annotated training corpora. TweeTIME achieves 0.68 F1 score on the end-to-end task of resolving date expressions, outperforming a broad range of state-of-the-art systems.

show abstract

TweeTime: A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter

Tabassum¹,

Ritter²,

Xu³

2016

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeniya Tabassum

Code and Named Entity Recognition in StackOverflow

WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols

Code and Named Entity Recognition in StackOverflow

TweeTime : A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter

TweeTime: A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter

Contact Info

Product

Resources

About