Performing statistical analyses on collections of graphs is of import to many disciplines, but principled, scalable methods for multisample graph inference are few. In this paper, we describe an omnibus embedding in which multiple graphs on the same vertex set are jointly embedded into a single space with a distinct representation for each graph. We prove a central limit theorem for this omnibus embedding, and we show that this simultaneous embedding into a single common space allows for the comparison of graphs without the requirement that the embedded points associated to each graph undergo cumbersome pairwise alignments. Moreover, the existence of multiple embedded points for each vertex renders possible the resolution of important multiscale graph inference goals, such as the identification of specific subgraphs or vertices as drivers of similarity or difference across large networks. The omnibus embedding achieves near-optimal inference accuracy when graphs arise from a common distribution and yet retains discriminatory power as a test procedure for the comparison of different graphs. We demonstrate the applicability of the omnibus embedding in two analyses of connectomic graphs generated from MRI scans of the brain in human subjects. We show how the omnibus embedding can be used to detect statistically significant differences, at multiple scales, across these networks, with an identification of specific brain regions that are associated with these population-level differences. Finally, we sketch how the omnibus embedding can be used to address pressing open problems, both theoretical and practical, in multisample graph inference.
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.
Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments. Recent work has shown that comparing speech segments by representing them as fixed-dimensional vectors -acoustic word embeddings -and measuring their vector distance (e.g., cosine distance) can discriminate between words more accurately than DTW-based approaches. We consider an approach to queryby-example search that embeds both the query and database segments according to a neural model, followed by nearestneighbor search to find the matching segments. Earlier work on embedding-based query-by-example, using template-based acoustic word embeddings, achieved competitive performance. We find that our embeddings, based on recurrent neural networks trained to optimize word discrimination, achieve substantial improvements in performance and run-time efficiency over the previous approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.