The accurate identification of an attention deficit hyperactivity disorder (ADHD) subject has remained a challenge for both neuroscience research and clinical diagnosis. Unfortunately, the traditional methods concerning the classification model and feature extraction usually depend on the single-channel model and static measurements (i.e., functional connectivity, FC) in the small, homogenous single-site dataset, which is limited and may cause the loss of intrinsic information in functional MRI (fMRI). In this study, we proposed a new two-stage network structure by combing a separated channel convolutional neural network (SC-CNN) with an attention-based network (SC-CNN-attention) to discriminate ADHD and healthy controls on a large-scale multi-site database (5 sites and n = 1019). To utilize both intrinsic temporal feature and the interactions of temporal dependent in whole-brain resting-state fMRI, in the first stage of our proposed network structure, a SC- CNN is used to learn the temporal feature of each brain region, and an attention network in the second stage is adopted to capture temporal dependent features among regions and extract fusion features. Using a âleave-one-site-outâ cross-validation framework, our proposed method obtained a mean classification accuracy of 68.6% on five different sites, which is higher than those reported in previous studies. The classification results demonstrate that our proposed network is robust to data variants and is also replicated across sites. The combination of the SC-CNN with the attention network is powerful to capture the intrinsic fMRI information to discriminate ADHD across multi-site resting-state fMRI data.