Previous studies have examined various aspects of user behavior on the Web, including general informationseeking patterns, search engine use, and revisitation habits. Little research has been conducted to study how users navigate and interact with their Web browser across different information-seeking tasks. We have conducted a field study of 21 participants, in which we logged detailed Web usage and asked participants to provide task categorizations of their Web usage based on the following categories: Fact Finding, Information Gathering, Browsing, and Transactions. We used implicit measures logged during each task session to provide usage measures such as dwell time, number of pages viewed, and the use of specific browser navigation mechanisms. We also report on differences in how participants interacted with their Web browser across the range of information-seeking tasks. Within each type of task, we found several distinguishing characteristics. In particular, Information Gathering tasks were the most complex; participants spent more time completing this task, viewed more pages, and used the Web browser functions most heavily during this task. The results of this analysis have been used to provide implications for future support of information seeking on the Web as well as direction for future research in this area.
While researchers have been studying user activity on the Web since its inception, there remains a lack of understanding of the high level tasks in which users engage on the Web. We have recently conducted a field study in which participants were asked to annotate all web usage with a task description and categorization. Based on our analysis of participants' recorded tasks during the field study, as well as previous research, we have developed a goal-based classification of information tasks which describes user activities on the Web.
In this research, a systematic study is conducted of four dimension reduction techniques for the text clustering problem, using five benchmark data sets. Of the four methods --Independent Component Analysis (ICA), Latent Semantic Indexing (LSI), Document Frequency (DF) and Random Projection (RP) --ICA and LSI are clearly superior when the k-means clustering algorithm is applied, irrespective of the data sets. Random projection consistently returns the worst results, where this appears to be due to the noise distribution characterizing the document clustering task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.