“…We ran experiments on the above datasets using the following methodologies for the polynomial fusion layer: (3), (4), (5), (6b) (PF-CMF-SR) We set a = 256, d = 128, n = 10 and trained on video sequences of 3 seconds with frame size 128 × 96 as per [2]. For all models, c = a + d + n = 394, implying m = 384, given that c = m + n by construction.…”