Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor

Munk, Michal; Kapusta, Jozef; Švec, Peter

doi:10.1016/j.procs.2010.04.255

Cited by 84 publications

(38 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Filling the missing data is done by usually adding a tuple or replacing the missing data by a global constant. Applying the data cleansing task in my work the primary step is to remove the stop words from the data which has been fetched by the web crawler [4,8]. Elimination of stop words and stemming [11]: In this phase, data which has less semantic is removed.…”

Section: ) Data Discretizationmentioning

confidence: 99%

Using Fuzzy Clustering Powered by Weighted Feature Matrix to Establish Hidden Semantics in Web Documents

Patil¹,

Kulkarni²

2018

ijacsa

View full text Add to dashboard Cite

Abstract-Digital Data is growing exponentially exploding on the 'World Wide Web'. The orthodox clustering algorithms obligate various challenges to tackle, of which the most often faced challenge is the uncertainty. Web documents have become heterogeneous and very complex. There exist multiple relations between one web document and others in the form of entrenched links. This can be imagined as a one to many (1-M) relationships, for example, a particular web document may fit in many cross domains viz. politics, sports, utilities, technology, music, weather forecasting, linked to ecommerce products, etc. Therefore, there is a necessity for efficient, effective and constructive context driven clustering methods. Orthodox or the already wellestablished clustering algorithms adhere to classify the given data sets as exclusive clusters. Signifies that we can clearly state whether to which cluster an object belongs to. But such a partition is not sufficient for representing in the real time. So, a fuzzy clustering method is presented to build clusters with indeterminate limits and allows that one object belongs to overlying clusters with some membership degree. In supplementary words, the crux of fuzzy clustering is to contemplate the fitting status to the clusters, as well as to cogitate to what degree the object belongs to the cluster. The aim of this study is to device a fuzzy clustering algorithm which along with the help of feature weighted matrix, increases the probability of multi-domain overlapping of web documents. Over-lapping in the sense that one document may fall into multiple domains. The use of features gives an option or a filter on the basis of which the data would be extracted through the document. Matrix allows us to compute a threshold value which in turn helps to calculate the clustering result.

show abstract

Section: ) Data Discretizationmentioning

confidence: 99%

Using Fuzzy Clustering Powered by Weighted Feature Matrix to Establish Hidden Semantics in Web Documents

Patil¹,

Kulkarni²

2018

ijacsa

View full text Add to dashboard Cite

show abstract

“…Data pre-processing is recognized as a crucial step in WUM analysis (Cooley et al, 1997;Hussain, Asghar, & Masood, 2010;Munk & Drlík, 2011;Munk, Kapusta, & Švec, 2010) and is estimated to take typically between 60% and 80% of the total analysis time (Hussain et al, 2010;Marquardt, Becker, & Ruiz, 2004).…”

Section: Process and Heuristicsmentioning

confidence: 99%

“…Hence, it can be problematic to define user sessions based on time. Munk et al (2010) adopted 10-minute timeout intervals for session identification and identified path completion pre-processing as an important step for improving the quality of extracted data. Similarly, Raju and Satyanarayana (2008) proposed a complete pre-processing methodology and suggested the use of 30-minute session timeout intervals.…”

Section: Process and Heuristicsmentioning

confidence: 99%

Does Time-on-task Estimation Matter? Implications on Validity of Learning Analytics Findings

et al. 2016

View full text Add to dashboard Cite

ABSTRACT:With widespread adoption of Learning Management Systems (LMS) and other learning technology, large amounts of data -commonly known as trace data -are readily accessible to researchers. Trace data has been extensively used to calculate time that students spend on different learning activities -typically referred to as time-on-task. These measures are used to build predictive models of student learning in order to understand and improve learning processes. While time-on-task measures have been used in Learning Analytics research, the consequences of their use are not fully described or examined. This paper presents findings from two experiments regarding different time-on-task estimation methods and their influence on research findings. Based on modelling different student performance measures with popular statistical methods in two datasets (one online, one blended), our findings indicate that time-on-task estimation methods play an important role in shaping the final study results, particularly in online settings where the amount of interaction with LMS is typically higher. The primary goal of this paper is to raise awareness and initiate debate on the important issue of time-on-task estimation within the broader learning analytics community. Finally, the paper provides an overview of commonly adopted time-on-task estimation methods in educational and related research fields.

show abstract

“…Finally, our algorithm fi nds out that page A contains a hyperlink to page X and after the termination of the backward path analysis the sequence will look like this A→B→C→D→C→B→A→X. It means that the user used the Back button in order to transfer from page D to C, from C to B and from B to A [7], [20].…”

Section: Path Completionmentioning

confidence: 99%

Influence of ratio of auxiliary pages on the pre-processing phase of web usage mining

Munk¹,

Benko²,

Gangur³

et al. 2015

E+M

Self Cite

View full text Add to dashboard Cite

Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor

Cited by 84 publications

References 10 publications

Using Fuzzy Clustering Powered by Weighted Feature Matrix to Establish Hidden Semantics in Web Documents

Using Fuzzy Clustering Powered by Weighted Feature Matrix to Establish Hidden Semantics in Web Documents

Does Time-on-task Estimation Matter? Implications on Validity of Learning Analytics Findings

Influence of ratio of auxiliary pages on the pre-processing phase of web usage mining

Contact Info

Product

Resources

About