“…Regarding tasks and technologies, we identified some low-level tasks performed as basic steps, of which, many rely on deep learning approaches. For example, for textual video elements, several studies employed optical character recognition [14,21,29,32,49,50,54,64,67,102,109,114,123,138,152,160,161,179,195,255,261,262,265,273,276], keyword extraction [14,40,43,61,92,105,106,109,112,121,128,131,161,206,255,264,271], generic natural language processing methods (e.g., [29,127,128,194,243]), or utilized word embeddings (e.g., [54,…”