Objective: This study aimed to systematically assess the characteristics and risk of bias of previous studies that have investigated nonlinear machine learning algorithms for warfarin dose prediction. Methods: We systematically searched PubMed, Embase, Cochrane Library, Chinese National Knowledge Infrastructure (CNKI), China Biology Medicine (CBM), China Science and Technology Journal Database (VIP), and Wanfang Database up to March 2022. We assessed the general characteristics of the included studies with respect to the participants, predictors, model development, and model evaluation. The methodological quality of the studies was determined, and the risk of bias was evaluated using the Prediction model Risk of Bias Assessment Tool (PROBAST). Results: From a total of 8996 studies, 23 were assessed in this study, of which 23 (100%) were retrospective, and 11 studies focused on the Asian population. The most common demographic and clinical predictors were age (21/23, 91%), weight (17/23, 74%), height (12/23, 52%), and amiodarone combination (11/23, 48%), while CYP2C9 (14/23, 61%), VKORC1 (14/23, 61%), and CYP4F2 (5/23, 22%) were the most common genetic predictors. Of the included studies, the MAE ranged from 1.47 to 10.86 mg/week in model development studies, from 2.42 to 5.18 mg/week in model development with external validation (same data) studies, from 12.07 to 17.59 mg/week in model development with external validation (another data) studies, and from 4.40 to 4.84 mg/week in model external validation studies. All studies were evaluated as having a high risk of bias. Factors contributing to the risk of bias include inappropriate exclusion of participants (10/23, 43%), small sample size (15/23, 65%), poor handling of missing data (20/23, 87%), and incorrect method of selecting predictors (8/23, 35%). Conclusions: Most studies on nonlinear-machine-learning-based warfarin prediction models show poor methodological quality and have a high risk of bias. The analysis domain is the major contributor to the overall high risk of bias. External validity and model reproducibility are lacking in most studies. Future studies should focus on external validity, diminish risk of bias, and enhance real-world clinical relevance.