Peer-to-peer (P2P) lending is facing severe information asymmetry problems and depends highly on the internal credit scoring system. This paper provides a novel credit scoring model, which forecasts the probability of default for each applicant and guides the lenders' decision-making in P2P lending. The proposal is expected to improve the existing credit scoring models in P2P lending from two aspects, namely the classifier and the usage of narrative data. We utilize an advanced gradient boosting decision tree technique (i.e., CatBoost) to predict default loans. Moreover, a soft information extraction technique based on keyword clustering is developed to compensate for the insufficient hard credit data. Validated on three real-world datasets, the experimental results demonstrate that variables extracted from narrative data are powerful features, and the utilization of narrative data significantly improves the predictability relative to solely using hard information. The results of sensitivity analysis reveal that CatBoost outperforms the industry benchmark under different cluster numbers of extracted soft information; meanwhile a small number of clusters (e.g., three) is preferred for consideration of model performance, computational cost, and comprehensibility. We finally facilitate a discussion on practical implication and explanatory considerations.