Keyword-based Vehicle Retrieval

Park, Eunju; Kim, Ho-Young; Jeong, Seonghwan; Kang, Bada; Kwon, Young‐Min

doi:10.1109/cvprw53098.2021.00477

Cited by 11 publications

(5 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These approaches are well‐suited for single animated objects. Time can be fed as a latent representation to the neural radiance field, e.g., by simple concatenation [LSZ*22], as 4D spatiotemporal positional encoding [PSJ*23] or by lifting time to higher dimensions using additional networks [YJM*23,FYW*22,PSJ*23]. Additionally, one can further reduce the dimensionality of the latent space using tensor decomposition, notably 2D‐2D [SZT*23] and 3D‐1D [IRG*23].…”

Section: Related Workmentioning

confidence: 99%

Real‐time Neural Rendering of Dynamic Light Fields

Coomans,

Dominci,

Döring

et al. 2024

Computer Graphics Forum

View full text Add to dashboard Cite

Synthesising high‐quality views of dynamic scenes via path tracing is prohibitively expensive. Although caching offline‐quality global illumination in neural networks alleviates this issue, existing neural view synthesis methods are limited to mainly static scenes, have low inference performance or do not integrate well with existing rendering paradigms. We propose a novel neural method that is able to capture a dynamic light field, renders at real‐time frame rates at 1920× resolution and integrates seamlessly with Monte Carlo ray tracing frameworks. We demonstrate how a combination of spatial, temporal and a novel surface‐space encoding are each effective at capturing different kinds of spatio‐temporal signals. Together with a compact fully‐fused neural network and architectural improvements, we achieve a twenty‐fold increase in network inference speed compared to related methods at equal or better quality. Our approach is suitable for providing offline‐quality real‐time rendering in a variety of scenarios, such as free‐viewpoint video, interactive multi‐view rendering, or streaming rendering. Finally, our work can be integrated into other rendering paradigms, e.g., providing a dynamic background for interactive scenarios where the foreground is rendered with traditional methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Real‐time Neural Rendering of Dynamic Light Fields

Coomans,

Dominci,

Döring

et al. 2024

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…We compare our OMG with previous state-of-the-art methods in Table 3. It is shown that our Team MRR OMG(ours) 0.3012 Alibaba-UTS-ZJU [1] 0.1869 SDU-XidianU-SDJZU [38] 0.1613 SUNYKorea [33] 0.1594 Sun Asterisk [30] 0.1571 HCMUS [31] 0.1560 TUE [37] 0.1548 JHU-UMD [14] 0.1364 Modulabs-Naver-KookminU [15] 0.1195 Unimore [36] 0.1078…”

Section: Evaluation Resultsmentioning

confidence: 99%

“…Tien-Phat et al [31] adapts COOT [8] to model the cross-modal relationships with both appearance and motion attributes. Eun-Ju et al [33] propose to perform color and type classification for both target and front-rear vehicles, and conduct movement analysis based on the Kalman filter algorithm [13]. DUN [38] uses pretrained CNN and GloVe [34] to extract modal-specific features and GRUs [3] to exploit temporal information.…”

Section: Text-based Vehicle Retrievalmentioning

confidence: 99%

OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval

Du¹,

Zhang²,

Ruan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Retrieving tracked-vehicles by natural language descriptions plays a critical role in smart city construction. It aims to find the best match for the given texts from a set of tracked vehicles in surveillance videos. Existing works generally solve it by a dual-stream framework, which consists of a text encoder, a visual encoder and a cross-modal loss function. Although some progress has been made, they failed to fully exploit the information at various levels of granularity. To tackle this issue, we propose a novel framework for the natural language-based vehicle retrieval task, OMG, which Observes Multiple Granularities with respect to visual representation, textual representation and objective functions. For the visual representation, target features, context features and motion features are encoded separately. For the textual representation, one global embedding, three local embeddings and a color-type prompt embedding are extracted to represent various granularities of semantic features. Finally, the overall framework is optimized by a cross-modal multi-granularity contrastive loss function. Experiments demonstrate the effectiveness of our method. Our OMG significantly outperforms all previous methods and ranks the 9th on the 6th AI City Challenge Track2. The codes are available at https://github.com/dyhBUPT/OMG.

show abstract

“…In the 5th NVIDIA AI City Challenge, the majority of teams [2], [16] [17], [18] [19], [20] chose to extract sentence embeddings of the queries, whereas two teams [21], [22] processed the NL queries using conventional NLP techniques. For cross-modality learning, certain teams [20], [2] used ReID models with the adoption of vision models pre-trained on visual ReID data and language models pre-trained on the given queries from the dataset.…”

Section: Related Work a Natural Language-based Vehicle-based Video Re...mentioning

confidence: 99%

“…The motion of vehicles is an integral component of the NL descriptions. Consequently, a number of teams [2], [18], [22] have developed specific methods for measuring and representing vehicle motion patterns.…”

Section: Related Work a Natural Language-based Vehicle-based Video Re...mentioning

confidence: 99%

DAKRS: Domain Adaptive Knowledge-Based Retrieval System for Natural Language-Based Vehicle Retrieval

Nguyen

Chung

2023

IEEE Access

View full text Add to dashboard Cite

Given Natural Language (NL) text descriptions, NL-based vehicle retrieval aims to extract target vehicles from a multi-view multi-camera traffic video pool. Due to inherent distinctions between textual and visual data, this is a challenging multi-modal retrieval task that requires robust feature extractors (e.g. neural network) to well-align the abstract representations of texts and images in the same domain. However, solutions to the problem have been challenged by the high data complexities of not only the multiview, multi-camera attributes of visual data and the diverse range of textual descriptions but also a lack of high-volume datasets in this relatively new field, alongside a prominently large domain gap between training and test sets. Many existing approaches have developed computationally expensive models to separately extract the subspaces of language and vision before blending into the same shared representation space while only focusing on single-modal information and ignoring much of the multi-modal information to deal with the aforementioned issues. Hence, we propose a Domain Adaptive Knowledge-based Retrieval System (DAKRS) to effectively and efficiently align multi-modal knowledge in a setting of limited labels. Our contributions are threefold: (i) An efficient extension of Contrastive Language-Image Pre-training (CLIP)'s transfer learning into a baseline text-to-image multi-modular vehicle retrieval framework; (ii) A data enhancement module to create pseudo-vehicle tracks from the traffic video pool by leveraging the robustness of baseline retrieval model combine with background subtraction; and (iii) A SSDA (SSDA) scheme to engineer pseudo-labels for adapting model parameters to the target domain distribution. Experimental results are benchmarked on the Cityflow-NL dataset, illustrating our competitiveness against state-ofthe-art performances in terms of effectiveness and efficiency without needing further post-processing or ensembling.

show abstract

Keyword-based Vehicle Retrieval

Cited by 11 publications

References 19 publications

Real‐time Neural Rendering of Dynamic Light Fields

Real‐time Neural Rendering of Dynamic Light Fields

OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval

DAKRS: Domain Adaptive Knowledge-Based Retrieval System for Natural Language-Based Vehicle Retrieval

Contact Info

Product

Resources

About