Decoding of brain tasks aims to identify individuals’ brain states and brain fingerprints to predict behavior. Deep learning provides an important platform for analyzing brain signals at different developmental stages to understand brain dynamics. Due to their internal architecture and feature extraction techniques, existing machine learning and deep-learning approaches for fMRI-based brain decoding must improve classification performance and explainability. The existing approaches also focus on something other than the behavioral traits that can tell about individuals’ variability in behavioral traits. In the current study, we hypothesized that even at the early childhood stage (as early as 3 years), connectivity between brain regions could decode brain tasks and predict behavioural performance in false-belief tasks. To this end, we proposed an explainable deep learning framework to decode brain states (Theory of Mind and Pain states) and predict individual performance on ToM-related false-belief tasks in a developmental dataset. We proposed an explainable spatiotemporal connectivity-based Graph Convolutional Neural Network (Ex-stGCNN) model for decoding brain tasks. Here, we consider a dataset (age range: 3-12 yrs and adults, samples: 155) in which participants were watching a short, soundless animated movie, ”Partly Cloudy,” that activated Theory-of-Mind (ToM) and pain networks. After scanning, the participants underwent a ToMrelated false-belief task, leading to categorization into the pass, fail, and inconsistent groups based on performance. We trained our proposed model using Static Functional Connectivity (SFC) and Inter-Subject Functional Correlations (ISFC) matrices separately. We observed that the stimulus-driven feature set (ISFC) could capture ToM and Pain brain states more accurately with an average accuracy of 94%, whereas it achieved 85% accuracy using SFC matrices. We also validated our results using five-fold cross-validation and achieved an average accuracy of 92%. Besides this study, we applied the SHAP approach to identify neurobiological brain fingerprints that contributed the most to predictions. We hypothesized that ToM network brain connectivity could predict individual performance on false-belief tasks. We proposed an Explainable Convolutional Variational Auto-Encoder model using functional connectivity (FC) to predict individual performance on false-belief tasks and achieved 90% accuracy.