The automatic standardization of nomenclature for anatomical structures in radiotherapy (RT) clinical data is a critical prerequisite for data curation and data-driven research in the era of big data and artificial intelligence, but it is currently an unmet need. Existing methods either cannot handle crossinstitutional datasets or suffer from heavy imbalance and poor-quality delineation in clinical RT datasets. To solve these problems, we propose an automated structure nomenclature standardization framework, 3D Nonlocal Network with Voting (3DNNV). This framework consists of an improved data processing strategy, namely, adaptive sampling and adaptive cropping (ASAC) with voting, and an optimized feature extraction module. The framework simulates clinicians' domain knowledge and recognition mechanisms to identify small-volume organs at risk (OARs) with heavily imbalanced data better than other methods. We used partial data from an open-source head-and-neck cancer dataset to train the model, then tested the model on three cross-institutional datasets to demonstrate its generalizability. 3DNNV outperformed the baseline model, achieving higher average true positive rates (TPR) over all categories on the three test datasets (+8.27%, +2.39%, and +5.53%, respectively). More importantly, the 3DNNV outperformed the baseline on the test dataset, 28.63% to 91.17%, in terms of F1 score for a small-volume OAR with only 9 training samples. The results show that 3DNNV can be applied to identify OARs, even error-prone ones. Furthermore, we discussed the limitations and applicability of the framework in practical scenarios. The framework we developed can assist in standardizing structure nomenclature to facilitate data-driven clinical research in cancer radiotherapy.