“…Pure image pixel comparison is ineffective. We expect that the robot can focus (i) Input: recorded video RV (ii) Output: test sequence script TSS (1) split_tag � 0 (2) for each frame in RV do (3) finger � getOpenPoseDetection(RV) (4) if finger is exist then (5) gesture_info ⇐ map(frame, split_tag, finger) ( 6) else (7) gui_info ⇐ map(frame, split_tag) (8) if pre_frames is gesture frame then (9) split_tag ++ (10) end if (11) end if (12) end for (13) for st � 0 to split_tag do (14) gui � getMiddleFrame(gui_info, st) (15) gui_elements � getObjectDetection(gui) (16) gui_skeleton � getGuiSkeleton(gui_elements) (17) gesture_micro � identifyGesture(gesture_info, st) (18) object � identifyObject(gesture_micro, gui_elements) (19) TTS ⇐ getTestSquence(st, gui_skeleton, gesture_micro, object) (20) end for (21) gui_elements' � getObjectDetection(gui_current) (5) gui_skeleton' � getGUISkeleton(gui_elements') (6) if serial_number is 0 then (7) break (8) else if serial_number is 1 then (9) for each tss in TSSs do (10) gui_skeleton � getTSSGUI(tss, serial_number) (11) e � getSimilarityJudgment(gui_skeleton', gui_skeleton) (12) end for (13) TSS' � getMaxSimliarScript(e, s_threshold) (14) serial_number � execteAction(TSS', serial_number, gui_elements') ( 15) else (16) gui_skeleton � getTSSGUI(TSS′, serial_number) (17) e � getSimilarityJudgment(gui_skeleton', gui_skeleton) (18) if e > s_threshold then (19) TR ⇐ recordResults() (20) serial_number � exectueAction(TSS', serial_number, gui_elements') ( 21) else (2...…”