Proceedings of the 2022 International Conference on Management of Data 2022
DOI: 10.1145/3514221.3517897
|View full text |Cite
|
Sign up to set email alerts
|

TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…The text dataset is publicly available on Kaggle and contains 3M+ tweets between users and customer support Twitter accounts [23]. For each video dataset we generated proxy scores from TASTI embeddings that we created with a pre-trained ResNet-18 model [19,29]. Our oracle labels were computed using a Mask R-CNN model [18].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The text dataset is publicly available on Kaggle and contains 3M+ tweets between users and customer support Twitter accounts [23]. For each video dataset we generated proxy scores from TASTI embeddings that we created with a pre-trained ResNet-18 model [19,29]. Our oracle labels were computed using a Mask R-CNN model [18].…”
Section: Methodsmentioning
confidence: 99%
“…The planner wishes to know the per-frame average number of cars that pass through an intersection. The planner could submit the following continuous query to InQuest: where count(car) is computed using an objection detection DNN and proxy_count_cars could be computed via an embedding index for unstructured data [29]. In this setting, proxy_count_cars returns an estimate of the car count for every frame.…”
Section: Examplesmentioning
confidence: 99%
“…Others help ensure reproducibility while iterating on different ideas [30,39,90]. With regards to validating changes in production systems, some researchers have studied CI (Continuous Integration) for ML and proposed preliminary solutions-for example, ease.ml/ci streamlines data management and proposes unit tests for overfitting [1], and some papers introduce tools to perform validation and monitoring in production ML pipelines [8,38,86]. Our work is complementary to existing literature on this tooling; we do not explicitly ask interviewees questions about tools, nor do we propose any tools.…”
Section: Software Engineering For MLmentioning
confidence: 99%
“…To do this, prior work has developed many different learning-to-approximate techniques that use lightweight models to approximate the reference model, using the reference model as supervision. This includes work in [12], [11], and more recently [13]. This is quite different than SeeSaw, where the goal is to optimize search queries only, without a reference model at all (i.e., SeeSaw finds new classes of objects for which no reference model is available.).…”
Section: Specifying Queries Over Image Datasetsmentioning
confidence: 99%