“…The labels required for supervised learning are often orders of magnitude smaller in size than the fMRI data itself, which has a high dimension in both space and time. As a result, the prior studies often limit the model capacity by using a shallow network and/or limit the input data to activity at the region of interest (ROI) level (Chen and Hu, 2018;Dvornek et al, 2018;Koppe et al, 2019;Matsubara et al, 2019;Suk et al, 2016;Wang et al, 2019;Wang et al, 2020) or reduce it to functional connectivity (D'Souza et al, 2019;Fan et al, 2020;Kawahara et al, 2017;Kim and Lee, 2016;Riaz et al, 2020;Seo et al, 2019;Venkatesh et al, 2019;Yang et al, 2019;Zhao et al, 2018). It is also uncertain to what extent representations learned for a specific task would be generalizable to other tasks.…”