Sampling Online Social Networks Using Coupling from the Past

White, K. D.; Li, Guichong; Japkowicz, Nathalie

doi:10.1109/icdmw.2012.126

Cited by 17 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We used a population dataset from Advanced Symbolics Inc. 3 (ASI), a market research company in Canada. ASI is continuously collecting tweets posted by Twitter users using Conditional Independence Coupler (CIC) sampling algorithm that is based on Coupling from the Past (CFTP) [24]. The stopping condition is enhanced by measuring the distance between the new node and the seed node, then adjusting the weights of sampling using post-stratification to compensate for the underrepresented groups of the population.…”

Section: B Population-level Twitter Datasetmentioning

confidence: 99%

Predicting Depression in Canada by Automatic Filling of Beck’s Depression Inventory Questionnaire

Skaik

Inkpen

2022

IEEE Access

View full text Add to dashboard Cite

The risk for depression and anxiety increased as people adjusted to a new normal after the COVID-19 pandemic. Early detection and appropriate onset treatment and support can reduce the consequences of depression. Automatic detection of depression in social media has recently become an important area of investigation. However, because of the lack of extensive annotated data, we propose a method for using a model that learns to answer a depression questionnaire and apply it to make population-level predictions. We used the eRisk 2021 Task 3 training dataset to build an automated model to fill the Beck's Depression Inventory (BDI) questionnaire. We selected the best performing model for each group of questions based on predefined metrics and consolidated those models into one model (called the BDI_M ulti_M odel). The BDI_M ulti_M odel achieved better performance than the state-of-the-art for this challenging task. Then, we used this model for inference on a Canadian population dataset and compared its predictions with the statistics of the most recent mental health survey conducted by Statistics Canada. The correlation between the inference of the answered questionnaire based on our BDI_M ulti_M odel and the official statistics showed a strong Pearson correlation of 0.90.

show abstract

Section: B Population-level Twitter Datasetmentioning

confidence: 99%

Predicting Depression in Canada by Automatic Filling of Beck’s Depression Inventory Questionnaire

Skaik

Inkpen

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In constructing our labeled Twitter dataset we initially randomly collected a sample of 282, 201 Twitter users from Canada by using the Conditional Independence Coupling (CIC) method [17]. CIC matches the prior distribution of the population, in this case the Canadian general population, ensuring that the sample is balanced for gender, race and age.…”

Section: Development Of Labeled Twitter Covid-19 Datasetmentioning

confidence: 99%

Independent Component Analysis for Trustworthy Cyberspace during High Impact Events: An Application to Covid-19

Boukouvalas¹,

Mallinson²,

Crothers³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Social media has become an important communication channel during high impact events, such as the COVID-19 pandemic. As misinformation in social media can rapidly spread, creating social unrest, curtailing the spread of misinformation during such events is a significant data challenge. While recent solutions that are based on machine learning have shown promise for the detection of misinformation, most widely used methods include approaches that rely on either handcrafted features that cannot be optimal for all scenarios, or those that are based on deep learning where the interpretation of the prediction results is not directly accessible. In this work, we propose a data-driven solution that is based on the ICA model, such that knowledge discovery and detection of misinformation are achieved jointly. To demonstrate the effectiveness of our method and compare its performance with deep learning methods, we developed a labeled COVID-19 Twitter dataset based on socio-linguistic criteria.

show abstract

“…Many sampling techniques were studied ranging from topical [11,19] to user-based approaches [12]. The first set of techniques is topic-based sampling, where specific keywords or hashtags are applied to collect tweets through Twitter API [6,20].…”

Section: Related Workmentioning

confidence: 99%

“…However, the major problem with the mentioned techniques is that, these techniques are biased toward high degree nodes similar to expert sampling. A solution to this problem is the traditional Monte Carlo Markov Chain (MCMC), which was proposed by White et al [12]. They applied a technique based on MCMC and Coupling From The Past (CFTP) to have better convergence in sampling.…”

mentioning

confidence: 99%

“…However, the issue of selecting a credible subset of users still remains. Nevertheless, many network-based sampling approaches were studied, which focus on sampling a subset of users from their networks [12] or sampling users based on their popularity [13]. The drawback behind the network-based sampling is that, a set of users are sampled from a static network while ignoring the availability of their posts over time.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Activity-based Twitter sampling for content-based and user-centric prediction models

Aghababaei

Makrehchi

2017

Hum. Cent. Comput. Inf. Sci.

View full text Add to dashboard Cite

BackgroundTwitter's public and open nature provides great opportunities for its users to actively participate in sharing their opinions and produce high quality content that is reflective of their tendencies and preferences in their day-to-day life [1]. This vast amount of publicly available user-generated content is applied to many applications ranging from tracking human social behavior [2][3][4], detecting events of interest [5][6][7], to smart business [8] where domain knowledge is collected through social media. These studies are either concerned with pulling Twitter and aggregating tweets as bulk or tracking historical tweets over time in order to find meaningful patterns for targeted events. The main challenge of the former studies is the limitation of the Twitter API in accessing only 1% of all existing tweets. However, despite this limitation, the latter studies are concerned with retrieving historical timelines of users.To tackle the above issues of retrieving more tweets beyond the 1% threshold and obtaining historical timelines, topic-based sampling and REST API are both shown to Abstract Increasingly more applications rely on crowd-sourced data from social media. Some of these applications are concerned with real-time data streams, while others are more focused on acquiring temporal footprints from historical data. Nevertheless, determining the subset of "credible" users is crucial. While the majority of sampling approaches focus on individual static networks, dynamic user activity over time is usually not considered, which may result in activity gaps in the collected data. Models based on noisy and missing data can significantly degrade in performance. In this study, we demonstrate how to sample Twitter users in order to produce more credible data for temporal prediction models. We present an activity-based sampling approach where users are selected based on their historical activities in Twitter. The predictability of the collected content from activity-based and random sampling is compared in a content-based and user-centric temporal model. The results indicate the importance of an activityoriented sampling method for the acquisition of more credible content for temporal models.Keywords: Twitter sampling, Temporal prediction models, Historical timelines, User activity, Activity-based sampling Open Access© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Aghababaei and Makrehchi Hum. Cent. Comput. Inf. Sci. (2017) Page 2 of 20 Aghababaei and Makrehchi Hum. Cent. Comput. Inf. Sci. (2017) 7:3 be more effective [9,10]. In topic-based sampling [11], a set of specific keywords or hashtags are applied to collect tweets through the search API...

show abstract

Sampling Online Social Networks Using Coupling from the Past

Cited by 17 publications

References 15 publications

Predicting Depression in Canada by Automatic Filling of Beck’s Depression Inventory Questionnaire

Predicting Depression in Canada by Automatic Filling of Beck’s Depression Inventory Questionnaire

Independent Component Analysis for Trustworthy Cyberspace during High Impact Events: An Application to Covid-19

Activity-based Twitter sampling for content-based and user-centric prediction models

Contact Info

Product

Resources

About