Abstract:Recently, online organizations became interested in tracking users' behavior on their websites to better understand and satisfy their needs. In response to this need, web usage mining tools were developed to help them use web logs to discover usage patterns or profiles. However, since website usage logs are being continuously generated, in some cases, amounting to a dynamic data stream, most existing tools are still not able to handle their changing nature or growing size. This paper proposes a scalable framework that is capable of tracking the changing nature of user behavior on a website, and represent it in a set of evolving usage profiles. These profiles can offer the best usage representation of user activity at any given time, and they can be used as an input to higher-level applications such as a web recommendation system. Our specific aim is to make the hierarchical unsupervised niche clustering (HUNC) algorithm more scalable, and to add integrated profile tracking and cluster-based validation to it. Our experiments on real web log data confirm the validity of our approach for large data sets that previously could not be handled in one shot.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.