2021
DOI: 10.48550/arxiv.2107.02025
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Textual Data Distributions: Kullback Leibler Textual Distributions Contrasts on GPT-2 Generated Texts, with Supervised, Unsupervised Learning on Vaccine & Market Topics & Sentiment

Jim Samuel,
Ratnakar Palle,
Eduardo Correa Soares

Abstract: Efficient textual data distributions (TDD) alignment and generation are open research problems in textual analytics and NLP. It is presently difficult to parsimoniously and methodologically confirm that two or more natural language datasets belong to similar distributions, and to identify the extent to which textual data possess alignment. This study focuses on addressing a segment of the broader problem described above by applying multiple supervised and unsupervised machine learning (ML) methods to explore t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…We are hopeful that such adaptive OCR solutions would be an important part of the rapidly advancing artificial intelligence ecosystems worldwide. Currently, most NLP research and practice use machine-readable typed data and associated textual data distributions [56]. It would be very useful to develop OCR solutions for handwritten documents to create a seamless integration with NLP solutions, such as sentiment analysis and NLP-based socioeconomic modeling [57][58][59].…”
Section: Discussionmentioning
confidence: 99%
“…We are hopeful that such adaptive OCR solutions would be an important part of the rapidly advancing artificial intelligence ecosystems worldwide. Currently, most NLP research and practice use machine-readable typed data and associated textual data distributions [56]. It would be very useful to develop OCR solutions for handwritten documents to create a seamless integration with NLP solutions, such as sentiment analysis and NLP-based socioeconomic modeling [57][58][59].…”
Section: Discussionmentioning
confidence: 99%