Characterizing the spatial distribution of proteins directly from microscopy images is a difficult problem with numerous applications in cell biology (e.g. identifying motor-related proteins) and clinical research (e.g. identification of cancer biomarkers). Here we describe the design of a system that provides automated analysis of punctate protein patterns in microscope images, including quantification of their relationships to microtubules. We constructed the system using confocal immunofluorescence microscopy images from the Human Protein Atlas project for 11 punctate proteins in three cultured cell lines. These proteins have previously been characterized as being primarily located in punctate structures, but their images had all been annotated by visual examination as being simply “vesicular”. We were able to show that these patterns could be distinguished from each other with high accuracy, and we were able to assign to one of these subclasses hundreds of proteins whose subcellular localization had not previously been well defined. In addition to providing these novel annotations, we built a generative approach to modeling of punctate distributions that captures the essential characteristics of the distinct patterns. Such models are expected to be valuable for representing and summarizing each pattern and for constructing systems biology simulations of cell behaviors.
The Human Protein Atlas is a rich source of location proteomics data. In this work, we present an automated approach for processing and classifying major subcellular patterns in the Atlas images. We demonstrate that two different classification frameworks (support vector machine and random forest) are effective at determining subcellular locations; we can analyze over 3500 Atlas images with a high degree of accuracy, up to 87.5% for all of the samples and 98.5% when only considering samples in whose classification assignments we are most confident. Moreover, the features obtained in both of these frameworks are observed to be highly consistent and generalizable. Additionally, we observe that the features relating the proteins to cell markers are especially important in automated learning approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.