User-generated Structured Query Language (SQL) queries are a rich source of information for database analysts, information scientists, and the end users of databases. In this study a group of scientists in astronomy and computer and information scientists work together to analyze a large volume of SQL log data generated by users of the Sloan Digital Sky Survey (SDSS) data archive in order to better understand users' data seeking behavior. While statistical analysis of such logs is useful at aggregated levels, efficiently exploring specific patterns of queries is often a challenging task due to the typically large volume of the data, multivariate features, and data requirements specified in SQL queries. To enable and facilitate effective and efficient exploration of the SDSS log data, we designed an interactive visualization tool, called the SDSS Log Viewer, which integrates time series visualization, text visualization, and dynamic query techniques. We describe two analysis scenarios of visual exploration of SDSS log data, including understanding unusually high daily query traffic and modeling the types of data seeking behaviors of massive query generators. The two scenarios demonstrate that the SDSS Log Viewer provides a novel and potentially valuable approach to support these targeted tasks.Keywords: SQL log analysis, visual exploratory analysis, multivariate data, text visualization, multiple views
INTRODUCTIONUser-generated Structured Query Language (SQL) query logs are common in science, business, engineering, and many other domains. These logs contain information about users' data seeking behavior and system performance, and thus are rich information sources for analysts such as database administrators, system designers, and user behavior researchers. Like normal transaction logs, SQL logs can be characterized as multivariate, temporal, and categorical event sequences. On the other hand, SQL logs have their own distinct feature: semi-structured and often complex queries generated by users. While statistical methods such as analytical functions offered by database systems can reveal query patterns at aggregated levels, quickly exploring SQL logs in detail is often challenging because of the typically large volume of log data, multivariate features, and text content of queries. In this paper we, a group of information scientists collaborated with a group of data scientists of the Sloan Digital Sky Survey (SDSS), work together to design and develop an interactive visualization tool, called the SDSS Log Viewer, which enables and facilitates visual exploration of the SQL log data generated by users of SDSS data archive. By integrating time series visualization, text visualization, and dynamic query techniques, the tool helps SDSS data analysts to quickly identify massive query generators and reveal their data seeking models. Although this is a domain-specific case study, the method and experience of this study can be generalized for similar datasets to discover patterns and relationships between log features and...