BackgroundHIV-associated neurocognitive disorder (HAND) remains an important and yet potentially underdiagnosed manifestation despite the fact that the modern combination antiretroviral therapy (cART) has achieved effective viral suppression and greatly reduced the incidence of life-threatening events. Although HIV neurotoxicity is thought to play a central role, the potential of viral genetic signature as diagnostic and/or prognostic biomarker has yet to be fully explored.ResultsUsing a manually curated sequence metadataset (80 specimens, 2349 sequences), we demonstrated that only three genetic features are sufficient to predict HAND status regardless of sampling tissues; the accuracy reached 100 and 94% in the hold-out testing subdataset and the entire dataset, respectively. The three genetic features stratified HAND into four distinct clusters. Extrapolating the classification to the 1619 specimens registered in the Los Alamos HIV Sequence Database, the global HAND prevalence was estimated to be 46%, with significant regional variations (30–71%). The R package HANDPrediction was implemented to ensure public availability of key codes.ConclusionsOur analysis revealed three amino acid positions in gp120 glycoprotein, providing the basis of the development of novel cART regimens specifically optimized for HAND-associated quasispecies. Moreover, the classifier can readily be translated into a diagnostic biomarker, warranting prospective validation.Electronic supplementary materialThe online version of this article (10.1186/s12977-018-0401-x) contains supplementary material, which is available to authorized users.