The medical literature has been growing exponentially, and its size has become a barrier for physicians to locate and extract clinically useful information. As a promising solution, natural language processing (NLP), especially machine learning (ML)-based NLP is a technology that potentially provides a promising solution. ML-based NLP is based on training a computational algorithm with a large number of annotated examples to allow the computer to "learn" and "predict" the meaning of human language.Although NLP has been widely applied in industry and business, most physicians still are not aware of the huge potential of this technology in medicine, and the implementation of NLP in breast cancer research and management is fairly limited. With a real-world successful project of identifying penetrance papers for breast and other cancer susceptibility genes, this review illustrates how to train and evaluate an NLPbased medical abstract classifier, incorporate it into a semiautomatic meta-analysis procedure, and validate the effectiveness of this procedure. Other implementations of NLP technology in breast cancer research, such as parsing pathology reports and mining electronic healthcare records, are also discussed. We hope this review will help breast cancer physicians and researchers to recognize, understand, and apply this technology to meet their own clinical or research needs.
K E Y W O R D Sbreast cancer, genetics, medical literature, natural language processing