16Objective. Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease that 17 is difficult to treat. There is currently no optimal stratification of patients with SLE, and thus 18 responses to available treatments are unpredictable. Here, we developed a new stratification 19 scheme for patients with SLE, based on the whole-blood transcriptomes of patients with SLE. 20Methods. We applied machine learning approaches to RNA-sequencing (RNA-seq) datasets 21 to stratify patients with SLE into four distinct clusters based on their gene expression profiles. 22A meta-analysis on two recently published whole-blood RNA-seq datasets was carried out and 23 an additional similar dataset of 30 patients with SLE and 29 healthy donors was contributed in 24 this research; 141 patients with SLE and 51 healthy donors were analysed in total. 25 Results. Examination of SLE clusters, as opposed to unstratified SLE patients, revealed 26 underappreciated differences in the pattern of expression of disease-related genes relative to 27 clinical presentation. Moreover, gene signatures correlated to flare activity were successfully 28 identified. 29 Conclusion. Given that disease heterogeneity has confounded research studies and clinical 30 trials, our approach addresses current unmet medical needs and provides a greater 31 understanding of SLE heterogeneity in humans. Stratification of patients based on gene 32 expression signatures may be a valuable strategy to harness disease heterogeneity and identify 33 patient populations that may be at an increased risk of disease symptoms. Further, this approach 34 can be used to understand the variability in responsiveness to therapeutics, thereby improving 35 the design of clinical trials and advancing personalised therapy. 36 37 Abstract word count: 242 38 39 Keywords 40 SLE, autoimmunity, RNA-seq, transcriptomics, stratification. 41 3 Abbreviations 42 ACR, American College of Rheumatology; ANA, anti-nuclear autoantibodies; BAFF, B cell 43 activating factor of the TNF family; cpm, counts per million; ECOC, error-correcting output 44 codes; ENA, extractible nuclear antigens; FPKM, fragments per kilobase of transcript per 45 million mapped reads; GILZ, glucocorticoid-induced leucine zipper; GO, gene ontology; HPC, 46 high performance computing; ISM, interferon signature metric; KEGG, Kyoto Encyclopedia 47
MATERIALS AND METHODS 85
Human subjects 86Human subjects in Datasets 1 and 3 are previously described (table 1). 12, 13 Patients with SLE 87 and healthy donors in Dataset 2 were recruited from the Monash Medical Centre. 14 Patients 88 with SLE fulfilled the American College of Rheumatology (ACR) classification criteria. 15 The 89 SLE disease activity index 2000 (SLEDAI-2k) 16 and the Physician Global Assessment (PGA; 90 range 0-3) 17 scores were recorded. Blood was collected into PAXgene Blood RNA tubes (BD), 91 which were frozen at -20 °C for later RNA extraction. Patients did not participate in the 92 analysis. 93 94 RNA extraction and RNA-sequencing 95