Rapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and highperformance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of largescale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/ Rapid development and wide adoption of mass spectrometrybased proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale 1,2 . However, this progress created unprecedented challenges for researchers to store, manage, and analyze the large scale mass spectrometry data. For instance, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has generated around 1.6 terabyte (TB, 240bytes) raw mass spectrometry data for TCGA ovarian cancer proteome using tandem mass spectrometry with isobaric tag for relative and absolute quantitation (iTRAQ) multiplex labelling method 3,4 . Mass spectrometry data analysis involves high-computing demands for peptide identification through database search, protein inference, post-translational modifications (PTMs) identification, and quantification of PTMs and global proteins on large scale high throughput data. Currently, proprietary software packages such as Proteome Discovery (Thermo Scientific), Sorcerer (Sage-N Research), SCAFFOLD (Proteome Software) and PEAKS (Bioinformatics Solutions Inc.) rely on high-performance computing hardware and computer cluster management. However, their maintenance and access are typically expensive. Open source, cloud-based computing has shown its advantages in terms of scalability and financial cost 5 . Many individual open-source tools with different search engines are available, e.g. OpenMS/TOPP 6 , TPP 7 , CPAS (Labkey Software F...