User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of N frames in a UGV as an M -hop path graph G o for M ≪ N , where the similarity between two frames within M time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" G o to a 1-hop path graph G, specified by a generalized graph Laplacian matrix L, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue λmin(B) of a coefficient matrix B = diag(h) + µL, where h is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound λ − min (B) by choosing h via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-ofthe-art methods, at a substantially reduced complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.