In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over the state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO are interpretable as outlier scores and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task. on the output outlier scores from SEANO and demonstrate its advantages over baseline methods specializing in network outlier detection. We finally conduct a case study of flood mapping to visually show the power of SEANO when applied to flood mapping. Related WorkNetwork Embedding: Network embedding strategies have gained increasing importance in recent years. Early ideas include IsoMap [35] and Locally Linear Embedding (LLE) [29], which exploited the manifold structure of vector data to compute low-dimensional embeddings. More recently, due to the emergence of naturally arising network data, other network embedding methods have been proposed [34,27,9]. In addition to learning embeddings for homogeneous networks, several researchers have proposed ideas for embedding attributed networks [36,13,12,26,37,15,10]. While they incorporate the attributes and/or label information into the embeddings, most are inherently transductive and cannot generate embeddings for vertices unseen during training. The two exceptions are Planetoid [37] and GraphSAGE [10] for inductive learning. However, Planetoid [37] is specialized for semi-supervised classification and the output network embedding, as a byproduct, does not capture all the information (as can be seen from their model architecture). Therefore, its embeddings might not be generalized to other applications such as visualization and clustering. GraphSAGE [10], on the other hand, only works on unsupervised learning or fully supervised learning setting and cannot be directly applied in a semi-supervised manner. Finally, none of the existing work on network embedding explicitly accounts for the impact of outliers. We summarize the differences between the proposed SEANO model with some of these recent efforts in Table 1. Method Attributes Labels Semisupervised Inductive Address Outliers DeepWalk [27] Node2Vec [9] TADW [36] LANE [13] TriDNR [26] Planetoid [37] GCN [15] GraphSAGE [10] SEANO Table 1: A comparison of SEANO w...
Directed graphs have been widely used in Community Question Answering services (CQAs) to model asymmetric relationships among different types of nodes in CQA graphs, e.g., question, answer, user. Asymmetric transitivity is an essential property of directed graphs, since it can play an important role in downstream graph inference and analysis. Question difficulty and user expertise follow the characteristic of asymmetric transitivity. Maintaining such properties, while reducing the graph to a lower dimensional vector embedding space, has been the focus of much recent research. In this paper, we tackle the challenge of directed graph embedding with asymmetric transitivity preservation and then leverage the proposed embedding method to solve a fundamental task in CQAs: how to appropriately route and assign newly posted questions to users with the suitable expertise and interest in CQAs. The technique incorporates graph hierarchy and reachability information naturally by relying on a nonlinear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. Extensive experiments show that our framework consistently and significantly outperforms the state-of-the-art baselines on three diverse realworld tasks: link prediction, and question difficulty estimation and expert finding in online forums like Stack Exchange. Particularly, our framework can support inductive embedding learning for newly posted questions (unseen nodes during training), and therefore can properly route and assign these kinds of questions to experts in CQAs.
Recently there has been a surge of interest in designing graph embedding methods. Few, if any, can scale to a large-sized graph with millions of nodes due to both computational complexity and memory requirements. In this paper, we relax this limitation by introducing the MultI-Level Embedding (MILE) framework -a generic methodology allowing contemporary graph embedding methods to scale to large graphs. MILE repeatedly coarsens the graph into smaller ones using a hybrid matching technique to maintain the backbone structure of the graph. It then applies existing embedding methods on the coarsest graph and refines the embeddings to the original graph through a novel graph convolution neural network that it learns. The proposed MILE framework is agnostic to the underlying graph embedding techniques and can be applied to many existing graph embedding methods without modifying them. We employ our framework on several popular graph embedding techniques and conduct embedding for real-world graphs. Experimental results on five large-scale datasets demonstrate that MILE significantly boosts the speed (order of magnitude) of graph embedding while also often generating embeddings of better quality for the task of node classification. MILE can comfortably scale to a graph with 9 million nodes and 40 million edges, on which existing methods run out of memory or take too long to compute on a modern workstation.
Outlier detection is a fundamental data science task with applications ranging from data cleaning to network security. Given the fundamental nature of the task, this has been the subject of much research. Recently, a new class of outlier detection algorithms has emerged, called contextual outlier detection, and has shown improved performance when studying anomalous behavior in a specific context. However, as we point out in this article, such approaches have limited applicability in situations where the context is sparse (i.e., lacking a suitable frame of reference). Moreover, approaches developed to date do not scale to large datasets. To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). We utilize a local and global behavioral model based on the relevant contexts, which is then integrated in a natural and robust fashion. We also present several optimizations to improve the scalability of the approach. We run ROCOD on both synthetic and real-world datasets and demonstrate that it outperforms other competitive baselines on the axes of efficacy and efficiency (40X speedup compared to modern contextual outlier detection methods). We also drill down and perform a fine-grained analysis to shed light on the rationale for the performance gains of ROCOD and reveal its effectiveness when handling objects with sparse contexts.
Stroke is a major cause of hemiparesis in United States. Constraint-Induced Movement therapy (CI therapy) is an effective treatment for upper extremity hemiparesis; however it is inaccessible to most patients. To make it more accessible, we developed a game-based rehabilitation system incorporating the major rehabilitation principles from CI therapy. We introduce a data analytics framework for our rehabilitation system in this paper that can provide objective measures of motor performance during gameplay. We design techniques of preprocessing collected data and propose a series of kinematic measurements, which are used to assess the motor performance and supplement in-clinic measures of therapeutic effect. We also present contextual filtering techniques to enable comparing movement production under different conditions, e.g., self-paced versus gamepaced movement. We apply our data analytics framework on data collected from several participants. Our analysis shows that participants' motor movement improves over the period of treatment, with different participants showing different patterns of improvement, e.g., speed versus range of motion. Results of kinematic measurements during gameplay are highly consistent with in-clinic performance based on the Wolf Motor Function Test. Moreover, our fine-grained trend analysis reveals potential to detect fatigue, which is related to the duration of gameplay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.