2010
DOI: 10.1016/j.procs.2010.04.255
|View full text |Cite
|
Sign up to set email alerts
|

Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor

Abstract: Presumptions of each data analysis are data themselves, regardless of the analysis focus (visit rate analysis, optimization of portal, personalization of portal, etc.). Results of selected analysis highly depend on the quality of analyzed data. In case of portal usage analysis, these data can be obtained by monitoring web server log file. We are able to create data matrices and web map based on these data which will serve for searching for behaviour patterns of users. Data preparation from the log file represe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 84 publications
(38 citation statements)
references
References 10 publications
0
38
0
Order By: Relevance
“…Filling the missing data is done by usually adding a tuple or replacing the missing data by a global constant. Applying the data cleansing task in my work the primary step is to remove the stop words from the data which has been fetched by the web crawler [4,8]. Elimination of stop words and stemming [11]: In this phase, data which has less semantic is removed.…”
Section: ) Data Discretizationmentioning
confidence: 99%
“…Filling the missing data is done by usually adding a tuple or replacing the missing data by a global constant. Applying the data cleansing task in my work the primary step is to remove the stop words from the data which has been fetched by the web crawler [4,8]. Elimination of stop words and stemming [11]: In this phase, data which has less semantic is removed.…”
Section: ) Data Discretizationmentioning
confidence: 99%
“…Data pre-processing is recognized as a crucial step in WUM analysis (Cooley et al, 1997;Hussain, Asghar, & Masood, 2010;Munk & Drlík, 2011;Munk, Kapusta, & Švec, 2010) and is estimated to take typically between 60% and 80% of the total analysis time (Hussain et al, 2010;Marquardt, Becker, & Ruiz, 2004).…”
Section: Process and Heuristicsmentioning
confidence: 99%
“…Hence, it can be problematic to define user sessions based on time. Munk et al (2010) adopted 10-minute timeout intervals for session identification and identified path completion pre-processing as an important step for improving the quality of extracted data. Similarly, Raju and Satyanarayana (2008) proposed a complete pre-processing methodology and suggested the use of 30-minute session timeout intervals.…”
Section: Process and Heuristicsmentioning
confidence: 99%
“…Finally, our algorithm fi nds out that page A contains a hyperlink to page X and after the termination of the backward path analysis the sequence will look like this A→B→C→D→C→B→A→X. It means that the user used the Back button in order to transfer from page D to C, from C to B and from B to A [7], [20].…”
Section: Path Completionmentioning
confidence: 99%