The EGAN software is a functional implementation of a simple yet powerful paradigm for exploration of large empirical data sets downstream from computational analysis. By focusing on systems-level analysis via enrichment statistics, EGAN enables a human domain expert to transform high-throughput analysis results into hypergraph visualizations: concept maps that leverage the expert's semantic understanding of metadata and relationships to produce insight. visualization, enrichment, metadata, big data, organic intelligence, data integration, multivariate statistics, cloud computing
BACKGROUND SetsHumans organize things in their environment into semantically meaningful sets. Natural language is a great example: an adjective is an annotation label that can be associated with one or more nouns; every noun X associated with adjective Y is an element of set Y. Nouns can also be sets themselves; the phrase "X is a Z" can be transformed into the logical concept "noun X is an element of set Z". These natural language principles reflect an aspect of human cognition that has persisted across millennia. In today's computational age, this process of entityto-set association has exploded into a universe of data.Consider a social network where entities are people in the network. Potential person-sets could be: hometown, current location, alma mater, current employer, first name, last name, movies or other media people like, product advertisements people have clicked on, games people play, posts people have commented on, hashtags people have used, and social contacts of one or more people; just to scratch the surface. The more broadly one expands the definition of person-sets, the richer the data describing each person-entity. This same concept applies to genomic research -where tens of thousands of genes have been annotated with tens of thousands of Gene Ontology terms hundreds of thousands of times 1 , to media libraries -where media items can be grouped and categorized (e.g. this paper has metadata tags as well as n-grams), to retail products, to companies listed on a stock exchange, to fantasy football results, etc.The actual data warehouses that store all this information may be arranged into loose, almost unstructured schemata or complex thousand-table relational database systems. The paradigm explored in this paper transforms all these models into a simple schema: 1) there are entities that are the focus of domain-specific research (e.g. people, genes, media items), 2) there are potential network connections between those entities (e.g. personal relationships, protein-protein interactions, nearest-neighbor media, hyperlinks), and 3) there are sets of entities, partitioned into set-categories (e.g. San Francisco, California as a set of people-entities is in the location setcategory, and UCSF as a set of people-entities is in the alma mater set-category; there may also exist a different set UCSF in the employer set-category).This schema is essentially a simple form of topic map 2 , where entities in this paper are e...