P-glycoprotein (P-gp, MDR1) is a promiscuous drug efflux pump of substantial pharmacological importance. Taking advantage of large-scale cytotoxicity screening data involving 60 cancer cell lines, we correlated the differential biological activities of ∼13,000 compounds against cellular P-gp levels. We created a large set of 934 high-confidence P-gp substrates or nonsubstrates by enforcing agreement with an orthogonal criterion involving P-gp overexpressing ADR-RES cells. A support vector machine (SVM) was 86.7% accurate in discriminating P-gp substrates on independent test data, exceeding previous models. Two molecular features had an overarching influence: nearly all P-gp substrates were large (>35 atoms including H) and dense (specific volume of <7.3 Å(3)/atom) molecules. Seven other descriptors and 24 molecular fragments ("effluxophores") were found enriched in the (non)substrates and incorporated into interpretable rule-based models. Biological experiments on an independent P-gp overexpressing cell line, the vincristine-resistant VK2, allowed us to reclassify six compounds previously annotated as substrates, validating our method's predictive ability. Models are freely available at http://pgp.biozyne.com .