Long Phan scite author profile

Long Phan

4Publications

64Citation Statements Received

78Citation Statements Given

How they've been cited

113

How they cite others

Affiliations

Viet Tri University of Industry, National Cancer Institute, Case Western Reserve University

Publications

Order By: Most citations

CoTexT: Multi-task Learning with Code-Text Transformer

Phan¹,

Tran²,

Le³

et al. 2021

View full text Add to dashboard Cite

We present CoTexT, a pre-trained, transformerbased encoder-decoder model that learns the representative context between natural language (NL) and programming language (PL). Using self-supervision, CoTexT is pretrained on large programming language corpora to learn a general understanding of language and code. CoTexT supports downstream NL-PL tasks such as code summarizing/documentation, code generation, defect detection, and code debugging. We train CoTexT on different combinations of available PL corpus including both "bimodal" and "unimodal" data. Here, bimodal data is the combination of text and corresponding code snippets, whereas unimodal data is merely code snippets. We first evaluate CoTexT with multi-task learning: we perform Code Summarization on 6 different programming languages and Code Refinement on both small and medium size featured in the CodeXGLUE dataset. We further conduct extensive experiments to investigate Co-TexT on other tasks within the CodeXGlue dataset, including Code Generation and Defect Detection. We consistently achieve SOTA results in these tasks, demonstrating the versatility of our models.

show abstract

CoTexT: Multi-task Learning with Code-Text Transformer

Phan¹,

Tran²,

Le³

et al. 2021

Preprint

View full text Add to dashboard Cite

Enriching Biomedical Knowledge for Low-resource Language Through Translation

Phan

Dang

Tran

et al. 2022

Preprint

View full text Add to dashboard Cite

Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5.

show abstract

ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation

Phan¹,

Tran²,

Nguyen³

et al. 2022

View full text Add to dashboard Cite

We present ViT5, a pretrained Transformerbased encoder-decoder model for the Vietnamese language.With T5-style selfsupervised pretraining, ViT5 is trained on a large corpus of high-quality and diverse Vietnamese texts. We benchmark ViT5 on two downstream text generation tasks, Abstractive Text Summarization and Named Entity Recognition. Although Abstractive Text Summarization has been widely studied for the English language thanks to its rich and large source of data, there has been minimal research into the same task in Vietnamese, a much lower resource language. In this work, we perform exhaustive experiments on both Vietnamese Abstractive Summarization and Named Entity Recognition, validating the performance of ViT5 against many other pretrained Transformer-based encoderdecoder models. Our experiments show that ViT5 significantly outperforms existing models and achieves state-of-the-art results on Vietnamese Text Summarization. On the task of Named Entity Recognition, ViT5 is competitive against previous best results from pretrained encoder-based Transformer models. Further analysis shows the importance of context length during the self-supervised pretraining on downstream performance across different settings.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Long Phan

CoTexT: Multi-task Learning with Code-Text Transformer

CoTexT: Multi-task Learning with Code-Text Transformer

Enriching Biomedical Knowledge for Low-resource Language Through Translation

ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation

Contact Info

Product

Resources

About