2010
DOI: 10.1145/1734200.1734203
|View full text |Cite
|
Sign up to set email alerts
|

Optimal distance bounds for fast search on compressed time-series query logs

Abstract: Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2010
2010
2014
2014

Publication Types

Select...
2
1
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 40 publications
0
8
0
Order By: Relevance
“…This is shown in Figure 2a). In [12,11], the authors advocated the use of high-energy coefficients and side information on the discarded coefficients for weblog sequence repositories; in that setting one of the sequences was compressed, whereas the query was uncompressed, i.e., all coefficients were available as illustrated in Figure 2b). This work examines the most general and challenging case when both objects are compressed.…”
Section: Related Workmentioning
confidence: 99%
“…This is shown in Figure 2a). In [12,11], the authors advocated the use of high-energy coefficients and side information on the discarded coefficients for weblog sequence repositories; in that setting one of the sequences was compressed, whereas the query was uncompressed, i.e., all coefficients were available as illustrated in Figure 2b). This work examines the most general and challenging case when both objects are compressed.…”
Section: Related Workmentioning
confidence: 99%
“…The rationale of these methods is to represent the signal in the frequency domain instead of the time domain to capture important signal properties such as periodicity. By capturing predominant patterns, lossy time series compression techniques enable operations such as nearest neighbor searches, and pattern searches [5] directly in the compressed domain, that are not possible with lossless compression techniques unless additional indexes are used. The main drawback of lossy compression techniques is that they rely on specific patterns for providing a good approximation of the given time series.…”
Section: Numeric Databases and Compressionmentioning
confidence: 99%
“…This is shown in Figure 3a). In [11,12], the authors advocated the use of high-energy coefficients and side-information on the discarded coefficients for weblog sequence repositories; in that setting one of the sequences was compressed, whereas the query was uncompressed, i.e., all coefficients were available as illustrated in Figure 3b). This work examines the most general and challenging case when both series are compressed.…”
Section: Related Workmentioning
confidence: 99%
“…a) Both X,Q are compressed by storing the first coefficients. b) Using the highest-energy coefficients for X, whereas Q is uncompressed as in [11,12] , and c) the problem we address: both sequences are compressed using the highest-energy coefficients. Note that in general for each object a different set of coefficients is used.…”
Section: Searching Data Using Distance Estimatesmentioning
confidence: 99%