Text-Based Temporal Localization of Novel Events

Paul, Sudipta; Mithun, Niluthpol Chowdhury; Roy-Chowdhury, Amit K.

doi:10.1007/978-3-031-19781-9_33

Cited by 5 publications

(3 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Charades-STA Unseen. Paul et al [9] investigated whether models can perform well on unseen videos that were not encountered before. "Unseen videos" is a term used for videos that contain queries consisting of nouns or verbs that were not included in the training set (i.e., unseen queries).…”

Section: Datasetsmentioning

confidence: 99%

“…We demonstrate the effectiveness of our model by showing superior performance on Charades-STA [1] and QVHighlights [6] datasets. In addition, we verify the robustness of BM-DETR by conducting comprehensive experiments on three challenging datasets: Charades-CD [7], Charades-CG [8], and Charades-STA Unseen [9], containing out-of-distribution test cases that are representative of real-world scenarios.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Study on the Bedload Discharge Estimation using CNN

Jung¹,

Jun²,

Kim³

et al. 2023

Preprint

View full text Add to dashboard Cite

Localized torrential rain, which has recently increased in frequency due to abnormal climate, accelerates erosion in the river basin and increases sediment transport into the river. The movement of inflowed sediment is one of the most important factors in the development and management of water resources. Among the mechanisms of sediment transport in rivers, bedload has limitations in direct measurement due to the risk it poses and inaccuracy in the existing measurement methods. Measurement equipment based on new concepts is continuously being developed to overcome these limitations. A representative equipment is a pipe hydrophone, which indirectly measures the bedload discharge by collecting and analyzing acoustic data when soil collides with a metal tube with a built-in microphone. To estimate the bedload discharge, this study acquired data through indoor experiment and applied them to the learning process of the Convolutional Neural Networks(CNN). First, an indoor hydraulic experiment device was built with a pipe hydrophone installed at the bottom of the water outlet of the indoor waterway. Then, a system for analyzing and displaying graphs for the impact sound of bedload, and data acquisition storage programs therein, was established. Finally, learning for bedload discharge estimation was conducted using CNN, and the accuracy of the estimation was reviewed. As a result, the F1-score for the accuracy of bedload discharge estimation was 61%, and the accuracy was higher when bedload discharge was 3kg and 10kg, compared to other weight ranges. Considering that the accuracy of 61% is an insufficient level to completely trust the estimated result, more efficient measurement would be possible by combining this method with the previously developed measurement methods in a complementary manner. In future studies, additional experimental data under various conditions will be secured and applied, to increase the accuracy of bedload discharge estimation. &#160; "This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(C20017370001)"

show abstract

Section: Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Study on the Bedload Discharge Estimation using CNN

Jung¹,

Jun²,

Kim³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Fully-Supervised Video Grounding. Many prior research employs supervised methods [21], [22], [23], [24], [25]. To achieve precise moment localization via language description, it is essential for a video grounding model to implement cross-modal alignment of videos and sentences.…”

Section: Related Workmentioning

confidence: 99%

Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization

Wang

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level annotation, explicitsupervision methods, i.e. , generating pseudo-temporal boundaries for training, have achieved great success. However, data augmentations in these methods might disrupt critical temporal information, yielding poor pseudo boundaries. In this paper, we propose a new perspective that maintains the integrity of the original temporal content while introducing more valuable information for expanding the incomplete boundaries. To this end, we propose EtC (Expand then Clarify), first use the additional information to expand the initial incomplete pseudo boundaries, and subsequently refine these expanded ones to achieve precise boundaries. Motivated by video continuity, i.e. , visual similarity across adjacent frames, we use powerful multimodal large language models (MLLMs) to annotate each frame within initial pseudo boundaries, yielding more comprehensive descriptions for expanded boundaries. To further clarify the noise of expanded boundaries, we combine mutual learning with a tailored proposal-level contrastive objective to use a learnable approach to harmonize a balance between incomplete yet clean (initial) and comprehensive yet noisy (expanded) boundaries for more precise ones. Experiments demonstrate the superiority of our method on two challenging WSVG datasets.

show abstract