"Can you believe [1:21]?!"

Yarmand, Matin; Yoon, Dongwook; Dodson, Samuel; Roll, Ido; Fels, Sidney

doi:10.1145/3290605.3300719

Cited by 14 publications

(7 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A notable pattern that emerged in these categories was the tendency of users to reference parts of the video in their comments to provide temporal and spatial context. In particular, we noticed several cases of referring to the visual part of the video, which aligns with previous findings on general videos [74]. In contrast to this prior work [74], we observed this practice in software tutorial videos, in which it can be particularly evident since they feature visual demonstrations through screen sharing.…”

Section: Introductionsupporting

confidence: 88%

“…In particular, we noticed several cases of referring to the visual part of the video, which aligns with previous findings on general videos [74]. In contrast to this prior work [74], we observed this practice in software tutorial videos, in which it can be particularly evident since they feature visual demonstrations through screen sharing.…”

Section: Introductionsupporting

confidence: 88%

“…Since these types of questions have direct relevance to video content, a notable trend that emerged from our analysis was the frequent references to video in these questions, sometimes explicitly citing timestamps, which echoes findings from prior research on referencing behavior in comments on a variety of videos [74]. It implies that our answer pipeline design should account for what in the video a question was about and the context of the tutorial when the question was posed.…”

Section: Implications On the Answer Pipelinesupporting

confidence: 71%

“…This resulted in 633 questions out of the 5944 comments. Similar to Yarmand et al [74], the lead author performed thematic coding of the set of 633 questions and iteratively discussed with the other authors to validate the codes and resolve any conflicts. After finalizing the codes, we grouped them into four main categories, reflecting the overarching themes of the questions.…”

Section: Methodsmentioning

confidence: 99%

“…Referencing specific audio or visual content within a video is a common practice during video interactions [59,74] suggests that the ability to easily refer to a part of a video enables a range of different applications, including enhanced engagement in live streams [12,72]. The ability to refer to parts of a video fosters a clear understanding of what others are discussing and facilitates pinpointed feedback or areas of confusion.…”

Section: Video Referencingmentioning

confidence: 99%

See 4 more Smart Citations

AQuA: Automated Question-Answering in Software Tutorial Videos with Visual Anchors

Yang,

Vermeulen,

Fitzmaurice

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Figure 1: Overall architecture of our question-answer pipeline AQuA, which generates useful responses to questions made in software tutorial videos. Questions are accompanied by visual anchors, which are specific visual elements of interest in the video. The Visual Recognition Module generates a textual description of the visual anchor. Combining the description with the question, the Retrieval Module retrieves relevant articles to the queries. Resources in yellow boxes are software-specific materials (in this case, for Fusion 360). Along with these retrieved articles, the question text, and the visual anchor description, we include the title and relevant transcript sentences of the tutorial video and feed them into GPT-4 through crafted prompts.

show abstract

Section: Introductionsupporting

confidence: 88%

Section: Introductionsupporting

confidence: 88%

Section: Implications On the Answer Pipelinesupporting

confidence: 71%

Section: Methodsmentioning

confidence: 99%