Sizhe Li scite author profile

Sizhe Li

3Publications

1Citation Statement Received

89Citation Statements Given

How they've been cited

How they cite others

Affiliations

Peking University

Publications

Order By: Most citations

Phrase-level Prediction for Video Temporal Localization

Zheng

et al. 2022

View full text Add to dashboard Cite

Video temporal localization aims to locate a period that semantically matches a natural language query in a given untrimmed video. We empirically observe that although existing approaches gain steady progress on sentence localization, the performance of phrase localization is far from satisfactory. In principle, the phrase should be easier to localize as fewer combinations of visual concepts need to be considered; such incapability indicates that the existing models only capture the sentence annotation bias in the benchmark but lack sufficient understanding of the intrinsic relationship between simple visual and language concepts, thus the model generalization and interpretability is questioned. This paper proposes a unified framework that can deal with both sentence and phrase-level localization, namely Phrase Level Prediction Net (PLPNet). Specifically, based on the hypothesis that similar phrases tend to focus on similar video cues, while dissimilar ones should not, we build a contrastive mechanism to restrain phrase-level localization without fine-grained phrase boundary annotation required in training. Moreover, considering the sentence's flexibility and wide discrepancy among phrases, we propose a clustering-based batch sampler to ensure that contrastive learning can be conducted efficiently. Extensive experiments demonstrate that our method surpasses state-of-the-art methods of phrase-level temporal localization while maintaining high performance in sentence localization and boosting the model's interpretability and generalization capability. Our code is available at https://github.com/sizhelee/PLPNet. CCS CONCEPTS• Computing methodologies → Visual content-based indexing and retrieval; Activity recognition and understanding.

show abstract

Selecting resonances in molecular scattering by anti-Zeno effect

Yang

Zhang

et al. 2023

View full text Add to dashboard Cite

Utilizing the anti-Zeno effect, we demonstrate that the resonances of ultracold molecular interactions can be selectively controlled by modulating the energy levels of molecules with a dynamic magnetic field. We show numerically that the inelastic scattering cross section of the selected isotopic molecules in the mixed isotopic molecular gas can be boosted for 2-3 orders of magnitude by modulation of Zeeman splittings. The mechanism of the resonant anti-Zeno effect in the ultracold scattering is based on matching the spectral modulation function of the magnetic field with the Floquet engineered resonance of the molecular collision. The resulting insight provides a recipe to implement resonant anti-Zeno effect in control of molecular interactions, such as selection of reaction channels between molecules involving shape and Feshbach resonances, and external field assisted separation of isotopes.

show abstract

Phrase-Level Temporal Relationship Mining for Temporal Sentence Localization

Zheng

Chen

et al. 2023

AAAI

View full text Add to dashboard Cite

In this paper, we address the problem of video temporal sentence localization, which aims to localize a target moment from videos according to a given language query. We observe that existing models suffer from a sheer performance drop when dealing with simple phrases contained in the sentence. It reveals the limitation that existing models only capture the annotation bias of the datasets but lack sufficient understanding of the semantic phrases in the query. To address this problem, we propose a phrase-level Temporal Relationship Mining (TRM) framework employing the temporal relationship relevant to the phrase and the whole sentence to have a better understanding of each semantic entity in the sentence. Specifically, we use phrase-level predictions to refine the sentence-level prediction, and use Multiple Instance Learning to improve the quality of phrase-level predictions. We also exploit the consistency and exclusiveness constraints of phrase-level and sentence-level predictions to regularize the training process, thus alleviating the ambiguity of each phrase prediction. The proposed approach sheds light on how machines can understand detailed phrases in a sentence and their compositions in their generality rather than learning the annotation biases. Experiments on the ActivityNet Captions and Charades-STA datasets show the effectiveness of our method on both phrase and sentence temporal localization and enable better model interpretability and generalization when dealing with unseen compositions of seen concepts. Code can be found at https://github.com/minghangz/TRM.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sizhe Li

Phrase-level Prediction for Video Temporal Localization

Selecting resonances in molecular scattering by anti-Zeno effect

Phrase-Level Temporal Relationship Mining for Temporal Sentence Localization

Contact Info

Product

Resources

About