Proceedings of the 10th International Conference on Data Science, Technology and Applications 2021
DOI: 10.5220/0010559000600070
|View full text |Cite
|
Sign up to set email alerts
|

textPrep: A Text Preprocessing Toolkit for Topic Modeling on Social Media Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Some of the challenges with social data have been looked at in [9], [14], [17], [22]. There also has been a few attempts to provide a quality framework for social media data as in [5], [20]. To the best of our knowledge, this is the first body of work that focuses on data quality challenges of social media analytics for FnB companies and provides a framework to combat it.…”
Section: Challenges Addressedmentioning
confidence: 99%
See 1 more Smart Citation
“…Some of the challenges with social data have been looked at in [9], [14], [17], [22]. There also has been a few attempts to provide a quality framework for social media data as in [5], [20]. To the best of our knowledge, this is the first body of work that focuses on data quality challenges of social media analytics for FnB companies and provides a framework to combat it.…”
Section: Challenges Addressedmentioning
confidence: 99%
“…Thus, ensuring the accuracy, and trustworthiness of data is currently a major problem for business. Despite the importance of assessing and governing data quality for social media analytics, there aren't many studies that specifically address this issue [5], [20]. Moreover, the ideal way to assess the impact of poor data quality is to employ some downstream tasks, but in order to do that, we typically need a test set made up of test samples and their ground truth labels.…”
Section: Introductionmentioning
confidence: 99%
“…The quality of emerging topics relies on the strategies used to preprocess text. However, research suggests that no one configuration of preprocessing rules is optimal across datasets and model types [185][186][187]. Therefore, we will explore various approaches to preprocessing the clinical notes, focusing on different ways to select the terms used to train the topic models (e.g., based on term frequency, term frequency-inverse document frequency weights, and named entity recognition).…”
Section: Emr-based Nlp and ML For The Characterization Of Longitudina...mentioning
confidence: 99%
“…This method is informative; however, the authors evaluated only one preprocessing method just over newspaper text, and the social media data was not investigated in their study. Churchill and Singh [47] suggested a standardized preprocessing approach for utilizing on-topic modelling over social media data. They showed the influence and usefulness of the proposed approach on topic modelling with various social media data.…”
Section: ) Preprocessingmentioning
confidence: 99%