2016
DOI: 10.1109/tkde.2015.2480400
|View full text |Cite
|
Sign up to set email alerts
|

Similarity Group-by Operators for Multi-Dimensional Relational Data

Abstract: Abstract-The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grouping provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0
5

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 22 publications
0
10
0
5
Order By: Relevance
“…The vast majority of them focuses on the Selection [Silva et al 2013] in which similarity awareness is achieved by means of range queries, nearest neighbors queries, and their many variants. Recent works also focus Grouping and Aggregation [Tang et al 2016] and the set-based operators [Al Marri et al 2016]. However, to the best of our knowledge, no one focuses on the Division.…”
Section: Basic Concepts and Related Workmentioning
confidence: 99%
“…The vast majority of them focuses on the Selection [Silva et al 2013] in which similarity awareness is achieved by means of range queries, nearest neighbors queries, and their many variants. Recent works also focus Grouping and Aggregation [Tang et al 2016] and the set-based operators [Al Marri et al 2016]. However, to the best of our knowledge, no one focuses on the Division.…”
Section: Basic Concepts and Related Workmentioning
confidence: 99%
“…No entanto, para alguns critérios de consultas por similaridade, tais como os k-vizinhos reversos ou as consultas com diversidade, ter acesso à totalidade do domínio ativo é a única forma de garantir a ordenação por distâncias (KORN; MUTHUKRISHNAN, 2000;SILVA et al, 2013). Para estas consultas podem ser traçadas duas alternativas, a saber, a extensão de outro operador que dê suporte à ordenação considerando um novo critério (GARCIA-MOLINA; ULLMAN; WIDOM, 2000; DATE, 2011); ou, a extensão do operador de Seleção para que este realize a filtragem e a ordenação dos elementos filtrados (CARVALHO et al, 2014;TANG et al, 2016). Em ambas as alternativas, é necessário explorar as consequências sobre os operadores relacionais já existentes, quais propriedades algébricas permanecem inalteradas e como isso afeta a otimização lógica de consultas, como discutido em Ferreira et al (2011), Aly, Aref e Ouzzani (2015).…”
Section: Operadores Relacionais E Consultas Por Similaridadeunclassified
“…Muitos desses protótipos também definem extensões para a linguagem SQL (AMATO; MAINETTO;SAVINO, 1997;BARIONI et al, 2009;BUDIKOVA; BATKO; ZEZULA, 2012) e utilizam estruturas de indexação concebidas especificamente para consultas por similaridade (CIACCIA; PATELLA; ZEZULA, 1997; TRAINA JR. et al, 2000;SKOPAL; POKORNỲ; SNASEL, 2004; NOVAK; BATKO; ZEZULA, 2011; CHEN et al, 2017b). De forma paralela, trabalhos recentes têm focado na extensão da álgebra relacional para definir novos operadores relacionais que incluam os comparadores por distância (SILVA et al, 2010;MARRI et al, 2014;TANG et al, 2016). Portanto, dentre os pontos acima destacados, o otimizador de consultas é o ponto que apresenta menor desenvolvimento, pois, além de depender das definições algébricas dos itens 1 e 2, depende também da definição de um ferramental estatístico diferente do usado para as Relações de Identidade e Ordem já presentes em um SGBD.…”
Section: Introductionunclassified
“…In this case, three groups will be formed: elements from 1 to 3 will belong to the first group; elements from 4 to 6 will belong to a second group; and elements from 7 to 10 will belong the a third group. Tang et al (2016) proposed the Similarity Group-By All that returns groups of elements whose pairwise distance between every element in the group is less than or equal to a threshold; and the Similarity Group By-Any, that creates groups in which one element must be similar to at least another element in the group. To deal with overlapping elements, the clause ON OVERLAP takes the following parameters: JOIN-ANY to assign element randomly to any overlapping groups;…”
Section: Similarity Queries In Metric Spacesmentioning
confidence: 99%