2022
DOI: 10.1007/978-3-030-98355-0_45
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR

Abstract: In multimedia search, appropriate user interfaces (UIs) are essential to enable effective specification of the user's information needs and the user-friendly presentation of search results. vitrivr-VR addresses these challenges and provides a novel Virtual Reality-based UI on top of the multimedia retrieval system vitrivr. In this paper we present the version of vitrivr-VR participating in the Video Browser Showdown (VBS) 2022. We describe our visual-text co-embedding feature and new query interfaces, namely t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 18 publications
0
12
0
Order By: Relevance
“…vitrivr-VR [32] is a multimedia retrieval system in Virtual Reality which builds upon the vitrivr stack, utilizing the same retrieval engine and database. It has shown competitive performance at interactive competitions [33], [34]. It offers novel VR-based ways to present and interact with results, such as a cylindrical result presentation view and a multimedia drawer with which results of a single day can be quickly explored in VR.…”
Section: Participant Team Overviewsmentioning
confidence: 99%
See 1 more Smart Citation
“…vitrivr-VR [32] is a multimedia retrieval system in Virtual Reality which builds upon the vitrivr stack, utilizing the same retrieval engine and database. It has shown competitive performance at interactive competitions [33], [34]. It offers novel VR-based ways to present and interact with results, such as a cylindrical result presentation view and a multimedia drawer with which results of a single day can be quickly explored in VR.…”
Section: Participant Team Overviewsmentioning
confidence: 99%
“…As for LSC'21, more than half of the teams apply this approach to their system. For instances, both vitrivr and vitrivr-VR utilise an approach similar to W2VV++ [13] originally developed and used in video retrieval [33], [34], [45]. In NTU-ILRS, the visual concepts provided by the object detection model from Microsoft Vision API are encoded with FastText [46] while the generated image captions and the user's queries are encoded with Sentence BERT [47].…”
Section: B Multimodal Embeddingsmentioning
confidence: 99%
“…The system has three core components: a database specialized for multimedia retrieval, Cottontail DB [2], a retrieval engine, Cineast [27], and the web-based user interface, vitrivr-ng. This modular structure allows for frontends which have different interaction paradigms [13,29,[32][33][34]. In this section, we provide a brief self-contained description of the system and its usage for lifelog retrieval.…”
Section: Vitrivrmentioning
confidence: 99%
“…Text Embedding: We extract textual embeddings from the images [32] using a similar approach to W2VV++ [16], as well as a feature based on OpenAI's CLIP [21], both of which can then be queried using textual input. We use the vector retrieval functionality of Cottontail DB for this purpose.…”
Section: Existing Functionality For Lifelog Retrievalmentioning
confidence: 99%
“…In vitrivr-VR text input enables query formulation for a number of query modalities. The text entered can be used to search for visual concepts extracted from the images, to perform a full-text search on scene description or optical character recognition (OCR), and to perform NNS using a visual-text co-embedding [13,14].…”
Section: Text Inputmentioning
confidence: 99%