“…Then, to obtain the explanation we are looking for about an event captured in an image, we would have to consider causal relationships that can either be inferred through expert knowledge (Martin, 2018) or intervene such data sets through experimentation, as indicated in He and Geng (2008), taking into account that, in probabilistic language, not having a way to distinguish between giving value to a variable and observing it, prevents modelling cause and effect relationships (Perry, 2003). Thus, taking modelling as an essential step to achieve causal inference, Xin et al (2022) discusses the role of causal inference to improve the interpretability and robustness of machine learning methods, and highlights opportunities in the development of machine learning models with causal capacity adapted for the analysis of mobility considering images or sequential data. In the punctual case on causal inference applied to images, Lopez-Paz et al (2017) propose to use neural causality coefficients (NCCs) that are calculated by applying convolutional neural networks (CNNs) to the pixels of an image, so that the appearance of causality between variables suggests that there is a causal link between the real-world entities themselves, Lebeda et al (2015) have proposed a statistical approach -transfer entropy -to discover and quantify the relationship between camera motion and the motion of a tracked object to predict the location of the tracked object, Fire and Zhu (2013) presented a Bayesian grammar (C-AOG) model for human-perceived causal relationships that can be learned from a video, and Pickup et al (2014) use the causality method, supplemented with computer vision and machine learning techniques, to determine whether a video is playing forward or backward by observing the "arrow of time" in a temporal sequence.…”