Proceedings of the 5th Annual International Systems and Storage Conference 2012
DOI: 10.1145/2367589.2367592
|View full text |Cite
|
Sign up to set email alerts
|

On extracting session data from activity logs

Abstract: Activity logs from large-scale systems facilitate the study of user behavior, which can be used to improve and tune the user experience. However, the available data often lacks important elements such as the identification of user sessions. Previous work typically compensated for this by setting a threshold of around 30 minutes, and assuming that breaks in activity longer than the threshold reflect breaks between sessions. We show that using such a global threshold introduces artifacts that may affect the anal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
1

Year Published

2013
2013
2019
2019

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(16 citation statements)
references
References 23 publications
0
15
1
Order By: Relevance
“…Calculating the exact dwell time is not obvious, since the time spent on the last page of a session is generally not known. Adding to this limitation, there is yet no consensus on how to identify when a session actually ends [15]. Multitasking makes it even more problematic to accurately calculate dwell time 2 because of backpaging.…”
Section: Backpaging and Dwell Timementioning
confidence: 99%
“…Calculating the exact dwell time is not obvious, since the time spent on the last page of a session is generally not known. Adding to this limitation, there is yet no consensus on how to identify when a session actually ends [15]. Multitasking makes it even more problematic to accurately calculate dwell time 2 because of backpaging.…”
Section: Backpaging and Dwell Timementioning
confidence: 99%
“…This technique also has the advantage of tailoring the threshold for each user separately, rather than assuming that a single threshold is suitable for all users [482]. For intuition, consider a graph of jobs as a function of time.…”
Section: End Boxmentioning
confidence: 99%
“…According to the work of Mehrzadi, using the Arrival approach may lead to artifacts in the distribution of session durations [6]. Specifically, he shows that in web search data the distribution of session lengths exhibits a pronounced drop exactly at the threshold value that was used to define the sessions.…”
Section: Artifacts In the Distribution Of Session Lengths With Arrivalmentioning
confidence: 99%