We present the tagFlake system, which supports semantically informed navigation within a tag cloud. tagFlake relies on TMine for organizing tags extracted from textual content in hierarchical organizations, suitable for navigation, visualization, classification, and tracking. TMine extracts the most significant tag/terms from text documents and maps them onto a hierarchy in such a way that descendant terms are contextually dependent on their ancestors within the given corpus of documents. This provides tagFlake with a mechanism for enabling navigation within the tag space and for classification of the text documents based on the contextual structure captured by the created hierarchy. tagFlake is language neutral, since it does not rely on any natural language processing technique and is unsupervised.
Categories and Subject Descriptors
General TermsAlgorithms, Human Factors
MOTIVATIONWith the quick growth of content over the web (e.g., the blogosphere), tag-based searches and tag cloud (sets of tags) based visualizations have become popular. Tags, whether provided by the user or extracted from the textual content, annotate online documents (such as blogs and news articles) with popular terms, thus providing an easy way to search and index them.Most visualizations of tag clouds vary the sizes of the fonts to differentiate most important tags from those that are less important (Figure 1). [2] aims creating visually pleasant tag clouds, by presenting tags in the form of seemingly random collections of circles with varying sizes: the size of the circle denotes its frequency. While quickly highlighting the most dominant terms in the corpus, these representations fall short in describing the context in which these terms occur in the collection. * This work is partially supported by NSF Award #0735014,"MAISON: Middleware for Accessible Information Spaces on NSDL". The need for contextually informed navigation within the blogosphere has been highlighted in the literature. For example, [16] observes that for large blog archives, a simple chronological order is not sufficient and a table of contents (TOC) like navigational hierarchy, depicting the topics development within the blog archive and describing how these topics relate to each other would be more effective.