This article contends that in the booming era of information, analysing users' navigation behaviour is an important task. User identification is considered as one of the important and challenging tasks in the data preprocessing phase of the Web usage mining process. There are three important issues with the reactive strategies of User identification methods that need to be focused: the first is dealing of sharing IP address problem in a proxy server environment, the second is distinguishing users from Web robots, and the third is dealing with huge datasets efficiently. In this article, authors have developed a MapReduce-based User identification algorithm that deals with the above mentioned three issues related to user identification methods. Moreover, the experiment on the real web server log shows the effectiveness and efficiency of the developed algorithm.
Due to huge, unstructured and scattered amount of data available on web, it is very tough for users to get relevant information in less time. To achieve this, improvement in design of web site, personalization of contents, prefetching and caching activities are done according to user's behavior analysis. User's activities can be captured into a special file called log file. There are various types of log: Server log, Proxy server log, Client/Browser log. These log files are used by web usage mining to analyze and discover useful patterns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.