In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outlierness. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers.
Comparative Effectiveness Research (CER) is defined as the generation and synthesis of evidence that compares the benefits and harms of different prevention and treatment methods. This is becoming an important field in informing health care providers about the best treatment for individual patients. Currently, the two major approaches in conducting CER are observational studies and randomized clinical trials. These approaches, however, often suffer from either scalability or cost issues.In this paper, we propose a third approach of conducting CER by utilizing online personal health messages, e.g., comments on online medical forums. The approach is effective in resolving the scalability and cost issues, enabling rapid deployment of system to identify treatments of interests, and developing hypotheses for formal CER studies. Moreover, by utilizing the demographic information of the patients, this approach may provide valuable results on the preferences of different demographic groups. Demographic information is extracted using our high precision automated demographic extraction algorithm. This approach is capable of extracting more than 30% of users' age and gender information.We conducted CER by utilizing personal health messages on breast cancer and heart disease. We were able to generate statiatically valid results, many of which have already been validated by clinical trials. Others could become hypothesis to be tested in future CER research.
Web communities such as healthcare web forums serve as popular platforms for users to get their complex medical queries resolved. A typical forum thread contains a query in its first post, and a discussion around it in subsequent posts. However many users do not receive satisfactory responses from other members in the community, leaving them dissatisfied. We propose to help these users by exploiting an existing collection of discussion threads.Often many users suffer from the same medical condition and start multiple discussion threads on very similar queries. In this paper we develop and evaluate a plethora of specialized search methods that treat an entire unresolved forum post as a query, and retrieve forum threads discussing similar problems to help resolve it. The task is more challenging than a traditional document retrieval problem, since forum posts can contain a lot of irrelevant background information. The discussion threads to be retrieved are also quite different from traditional unstructured text documents. We evaluate our results on a dataset comprising over 350K discussion threads and show that our proposed methods outperform state of the art retrieval methods for the task. In particular, method based on non-uniform weighting of thread posts and semantic analysis of the query text perform quite well.
Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to more accurately and efficiently find relevant information by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums, and propose novel pattern based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1,200 HealthBoards posts spanning four forum topics. Experimental results show that SVM with pattern based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics (e.g. training on allergy posts, testing on depression posts). Finally, we run a trained classifier on a MedHelp dataset to analyze the distribution of intents of posts from different forum topics.
Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to find and recommend relevant information to users by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums and propose novel pattern-based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1192 HealthBoards posts spanning four forum topics. Experimental results show that a SVM using pattern-based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%, and that a SVM-based hierarchical classifier using both pattern and word features outperforms its SVM counterpart that uses only word features. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.