25Background 26Non-small cell lung cancer (NSCLC) is the leading cause of cancer-related deaths 27 worldwide. Although dysbiosis of lung and gut microbiota have been associated with 28 NSCLC, their relative contributions are unclear; in addition, their roles in distant metastasis 29 (DM) are still illusive. 30
Results 31We surveyed the fecal and sputum (as a proxy for lung) microbiota in healthy controls and 32 NSCLC patients of various stages, and found significant perturbations of gut-and sputum-33 microbiota in patients with NSCLC and DM. Machine-learning models combining both 34 microbiota (mixed models) performed better than either dataset in patient stratification, 35with the highest area under the curve (AUC) value of 0.842. Sputum-microbiota 36 contributed more than the gut in the mixed models; in addition, sputum-only models 37 performed similarly to the mixed models in most cases. Several microbial-biomarkers were 38 shared by both microbiota, indicating their similar roles at distinct body sites. 39Microbial-biomarkers of distinct disease stages were mostly shared, suggesting 40 biomarkers for distant metastasis could be acquired early. Furthermore, Pseudomonas 41 aeruginosa, a species previously associated with wound infections, was significantly more 42 3 abundant in brain metastasis, indicating distinct types of DMs could have different 43 microbial-biomarkers. 44
Conclusion 45Our results indicate that alterations of sputum-microbiota have stronger relationships with 46 NSCLC and distant metastasis than the gut, and strongly support the feasibility of 47 metagenome-based non-invasive disease diagnosis and risk evaluation. 48 49 Keywords: gut microbiota, lung microbiota, machine learning, patient stratification, 50 NSCLC, distant metastasis, brain metastasis 51 52 4 53 Background 54 Lung cancer (LC) is the leading cause of cancer-related deaths mortality worldwide, with 55 non-small cell lung cancer (NSCLC) being the most common form of LC [1]. Despite the 56 recent development of therapies for NSCLC, tumor metastasis is the main cause of 57 recurrence and mortality in patients with NSCLC [1]. One of the key challenges is the low 58 heritability of lung cancer susceptibility revealed by genetic studies: although numerous 59 studies have established the important roles of somatic mutations as well as inheritable 60 familial risks [2, 3], the genetic influence can only explain 3~15% of the heritability [4, 5], 61 depending on the surveyed population. 62 Conversely, non-genetic factors, including life styles, environmental factors and lung 63 and gut microbes are believed to contribute mostly to the disease. Especially, numerous 64 recent studies have shown that both lung and gut microbiota are involved in the 65 development of LC [6-8]. For example, researchers have used samples from 66 bronchoalveolar fluid (BALF), tissues and spontaneous sputum of lung cancer patients for 67 bacterial identification and microbiome characterization [7, 9-11]. When compared with 68 healthy controls, researchers have id...