A major challenge in the analysis of tissue imaging data is cell segmentation, the task of identifying the precise boundary of every cell in an image. To address this problem we constructed TissueNet, a dataset for training segmentation models that contains more than 1 million manually labeled cells, an order of magnitude more than all previously published segmentation training datasets. We used TissueNet to train Mesmer, a deep learning-enabled segmentation algorithm. We demonstrated that Mesmer is more accurate than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We then adapted Mesmer to harness cell lineage information in highly multiplexed datasets and used this enhanced version to quantify cell morphology changes during human gestation. All code, data, and models are released as a community resource.
Understanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.
Deep learning is transforming the analysis of biological images but applying these models to large datasets remains challenging. Here we describe the DeepCell Kiosk, cloud-native software that dynamically scales deep learning workflows to accommodate large imaging datasets. To demonstrate the scalability and affordability of this software, we identified cell nuclei in 10 6 1-megapixel images in ~5.5 h for ~$250, with a sub-$100 cost achievable depending on cluster configuration. The DeepCell Kiosk can be downloaded at https://github.com/vanvalenlab/kiosk-console; a persistent deployment is available at https://deepcell.org. Main Text While deep learning is an increasingly popular approach to extracting quantitative information from biological images, its limitations significantly hinder its widespread adoption. Chief among these limitations are the requirements for expansive sets of training data and computational resources. Here, we sought to overcome the latter limitation. While deep learning methods have remarkable accuracy for a range of image-analysis tasks including classification 1 , segmentation 2-4 , and object tracking 5,6 , they have limited throughput even with GPU acceleration. For example, even when running segmentation models on a GPU, typical inference speeds on megapixel-scale images are in the range of 5-10 frames per second, limiting the scope of analyses that can be performed on images in a timely fashion. The necessary domain knowledge and associated costs of GPUs pose further barriers to entry, although recent software packages 7-11 have attempted to solve these two issues. While cloud computing has proven effective for other data types 12-15 , scaling analyses to large imaging datasets in the cloud while constraining costs is a considerable challenge. To meet this need, here we have developed the DeepCell Kiosk (Fig. 1a). This software package takes in configuration details (user authentication, GPU type, etc.) and creates a cluster in the cloud that runs predefined deep learningenabled image-analysis pipelines. This cluster is managed by Kubernetes, an open-source framework for running software containers (software that is bundled with its dependencies so it can be run as an isolated process) across a group of servers. An alternative way to view Kubernetes is as an operating system for cloud computing. Data is submitted to the cluster through either a web-based front-end, a command line tool, or an ImageJ plugin. Once submitted, it is placed in a database where the specified image-analysis pipeline can pick up the dataset, perform the desired analysis, and make the results available for download. Results can be visualized by a variety of visualization software tools 16,17. To ensure that image-analysis pipelines can be run efficiently on this cluster, we made two software design choices. First, image-analysis pipelines access trained deep learning models through a centralized model server in the cluster. This strategy enables the cluster to efficiently allocate resources, as the various co...
Live-cell imaging experiments have opened an exciting window into the behavior of living systems. While these experiments can produce rich data, the computational analysis of these datasets is challenging. Single-cell analysis requires that cells be accurately identified in each image and subsequently tracked over time. Increasingly, deep learning is being used to interpret microscopy image with single cell resolution. In this work, we apply deep learning to the problem of tracking single cells in live-cell imaging data. Using crowdsourcing and a human-in-the-loop approach to data annotation, we constructed a dataset of over 11,000 trajectories of cell nuclei that includes lineage information. Using this dataset, we successfully trained a deep learning model to perform cell tracking within a linear programming framework. Benchmarking tests demonstrate that our method achieves state-of-the-art performance on the task of cell tracking with respect to multiple accuracy metrics. Further, we show that our deep learning-based method generalizes to perform cell tracking for both fluorescent and brightfield images of the cell cytoplasm, despite having never been trained on those data types. This enables analysis of live-cell imaging data collected across imaging modalities. A persistent cloud deployment of our cell tracker is available at http://www.deepcell.org.
Deep learning is transforming the ability of life scientists to extract information from images. These techniques have better accuracy than conventional approaches and enable previously impossible analyses. As the capability of deep learning methods expands, they are increasingly being applied to large imaging datasets. The computational demands of deep learning present a significant barrier to large-scale image analysis. To meet this challenge, we have developed DeepCell 2.0, a platform for deploying deep learning models on large imaging datasets (>10 5megapixel images) in the cloud. This software enables the turnkey deployment of a Kubernetes cluster on all commonly used operating systems. By using a microservice architecture, our platform matches computational operations with their hardware requirements to reduce operating costs. Further, it scales computational resources to meet demand, drastically reducing the time necessary for analysis of large datasets. A thorough analysis of costs demonstrates that cloud computing is economically competitive for this application. By treating hardware infrastructure as software, this work foreshadows a new generation of software packages for biology in which computational resources are a dynamically allocated resource.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.