“…In terms of evaluation, given that CI is not a well-defined task in language processing, the results may be questioned due to their strict dependence on subjective human criteria. This is a clear point of general improvement (beyond the specific purposes of this work) toward the fair assessment of other related CI approaches such as the Twin Networks method to estimate the probabilities of causation (Vlontzos, A., Kainz, B., and Gilligan-Lee, C. M., 2021), the causal regularization of neural networks to improve their interpretability (Bahadori, M. T., Chalupka, K., Choi, E., Chen, R., Stewart, W. F., and Sun, J., 2017; Shen, Z., Cui, P., Kuang, K., Li, B., and Chen, P., 2018), or the learning of causally disentangled representations using Variational Autoencoders (Suter, R., Miladinović, D., Schölkopf, B., and Bauer, S.,, 2019;Yang, M., Liu, F., Chen, Z., Shen, X., Hao, J., and Wang, J., 2020).…”