Applying machine learning algorithms to analysis of multi-omics datasets has provided a wealth of analysis and interpretations of data not easily achievable by conventional methods. The Cancer Genomics Cloud (CGC), powered by Seven Bridges, offers many helpful features to simplify and streamline the performance of machine learning (ML) tasks on the platform. The CGC platform contains features and tools for all of the steps of a ML project, from data exploration, to model generation, to production, so that all key steps occur on the platform. The Data Cruncher interactive analysis tool enables users to perform data exploration, modeling, training, and visualization using familiar frameworks like Jupyterlab and RStudio. We applied this ML methodology to imaging tools, to create The SB Image Processing Toolkit. Here we demonstrate the use of this toolkit with specific cancer datasets, including training and analysis, with all steps taking place on the cloud.
The SB Image Processing Toolkit is a collection of various deep-learning, preprocessing, and utility tools and workflows created with the purpose of performing image class prediction on any type of image data. The toolkit supports a variety of images used in cancer research, such as histopathology and radiology images, and common formats like JPG or PNG. There are nine distinct tools within the toolkit, including methods for quality control, normalization, preprocessing for histopathology or radiology images, classification, and deep learning. Together, these tools create an easy-to-use infrastructure to enable various stages of ML and Image Processing. Most importantly, it was designed to enable users to create and use ML image classifiers on the CGC without any coding experience. All workflows can be searched for and implemented via the Public Apps Gallery on the CGC. Users can import image data, run analysis, and compare results all in one environment.
By harnessing the scale and flexibility of cloud computing, the SB Image Processing Toolkit brings massive speed and price improvements compared to manual writing and running model training scripts. In conjunction with the access to data and ease of use, the CGC makes complex machine learning methods accessible to any researcher.
Citation Format: Soner Koc, Jovana Babic, Nevena Nikolic, Ana Stankovic, Daniel Ventre, Manisha Ray, Zelia F. Worman, Sai Lakshmi Subramanian, Dennis A. Dean, Jack DiGiovanna, Brandi Davis Dusenbery. The SB Image Processing Toolkit: Machine learning for cancer research on the CGC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 6394.