The exposure of oral antiretroviral (ARV) drugs in the female genital tract (FGT) is variable and almost unpredictable. Identifying an efficient method to find compounds with high tissue penetration would streamline the development of regimens for both HIV preexposure prophylaxis and viral reservoir targeting. Here we describe the cheminformatics investigation of diverse drugs with known FGT penetration using cluster analysis and quantitative structure-activity relationships (QSAR) modeling. A literature search over the 1950-2012 period identified 58 compounds (including 21 ARVs and representing 13 drug classes) associated with their actual concentration data for cervical or vaginal tissue, or cervicovaginal fluid. Cluster analysis revealed significant trends in the penetrative ability for certain chemotypes. QSAR models to predict genital tract concentrations normalized to blood plasma concentrations were developed with two machine learning techniques utilizing drugs' molecular descriptors and pharmacokinetic parameters as inputs. The QSAR model with the highest predictive accuracy had R 2 test = 0.47. High volume of distribution, high MRP1 substrate probability, and low MRP4 substrate probability were associated with FGT concentrations ‡ 1.5-fold plasma concentrations. However, due to the limited FGT data available, prediction performances of all models were low. Despite this limitation, we were able to support our findings by correctly predicting the penetration class of rilpivirine and dolutegravir. With more data to enrich the models, we believe these methods could potentially enhance the current approach of clinical testing.