Automated image analysis of microscopic images such as protein crystallization images and cellular images is one of the important research areas. If objects in a scene appear at different depths with respect to the camera's focal point, objects outside the depth of field usually appear blurred. Therefore, scientists capture a collection of images with different depths of field. Focal stacking is a technique of creating a single focused image from a stack of images collected with different depths of field. In this paper, we introduce a novel focal stacking technique, FocusALL, which is based on our modified Harris Corner Response Measure. We also propose enhanced FocusALL for application on images collected under high resolution and varying illumination. FocusALL resolves problems related to the assumption that in focus regions have high contrast and high intensity. Especially, FocusALL generates sharper boundaries around protein crystal regions and good in focus images for high resolution images in reasonable time. FocusALL outperforms other methods on protein crystallization images and performs comparably well on other datasets such as retinal epithelial images and simulated datasets.
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.
BackgroundLarge number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The excessive number of features and computationally intensive image processing methods to extract these features make utilization of automated classification tools on stand-alone computing systems inconvenient due to the required time to complete the classification tasks. Combinations of image feature sets, feature reduction and classification techniques for crystallization images benefiting from trace fluorescence labeling are investigated.ResultsFeatures are categorized into intensity, graph, histogram, texture, shape adaptive, and region features (using binarized images generated by Otsu’s, green percentile, and morphological thresholding). The effects of normalization, feature reduction with principle components analysis (PCA), and feature selection using random forest classifier are also analyzed. The time required to extract feature categories is computed and an estimated time of extraction is provided for feature category combinations. We have conducted around 8624 experiments (different combinations of feature categories, binarization methods, feature reduction/selection, normalization, and crystal categories). The best experimental results are obtained using combinations of intensity features, region features using Otsu’s thresholding, region features using green percentile G 90 thresholding, region features using green percentile G 99 thresholding, graph features, and histogram features. Using this feature set combination, 96% accuracy (without misclassifying crystals as non-crystals) was achieved for the first level of classification to determine presence of crystals. Since missing a crystal is not desired, our algorithm is adjusted to achieve a high sensitivity rate. In the second level classification, 74.2% accuracy for (5-class) crystal sub-category classification. Best classification rates were achieved using random forest classifier.ContributionsThe feature extraction and classification could be completed in about 2 s per image on a stand-alone computing system, which is suitable for real time analysis. These results enable research groups to select features according to their hardware setups for real-time analysis.
In this paper, we investigate the performance of classification of protein crystallization images captured during protein crystal growth process. We group protein crystallization images into 3 categories: noncrystals, likely leads (conditions that may yield formation of crystals) and crystals. In this research, we only consider the subcategories of noncrystal and likely leads protein crystallization images separately. We use 5 different classifiers to solve this problem and we applied some data preprocessing methods such as principal component analysis (PCA), min-max (MM) normalization and z-score (ZS) normalization methods to our datasets in order to evaluate their effects on classifiers for the noncrystal and likely leads datasets. We performed our experiments on 1606 noncrystal and 245 likely leads images independently. We had satisfactory results for both datasets. We reached 96.8% accuracy for noncrystal dataset and 94.8% accuracy for likely leads dataset. Our target is to investigate the best classifiers with optimal preprocessing techniques on both noncrystal and likely leads datasets.
The goal of protein crystallization screening is the determination of the main factors of importance to crystallizing the protein under investigation. One of the major issues about determining these factors is that screening is often expanded to many hundreds or thousands of conditions to maximize combinatorial chemical space coverage for maximizing the chances of a successful (crystalline) outcome. In this paper, we propose an experimental design method called “Associative Experimental Design (AED)” and an optimization method includes eliminating prohibited combinations and prioritizing reagents based on AED analysis of results from protein crystallization experiments. AED generates candidate cocktails based on these initial screening results. These results are analyzed to determine those screening factors in chemical space that are most likely to lead to higher scoring outcomes, crystals. We have tested AED on three proteins derived from the hyperthermophile Thermococcus thioreducens, and we applied an optimization method to these proteins. Our AED method generated novel cocktails (count provided in parentheses) leading to crystals for three proteins as follows: Nucleoside diphosphate kinase (4), HAD superfamily hydrolase (2), Nucleoside kinase (1). After getting promising results, we have tested our optimization method on four different proteins. The AED method with optimization yielded 4, 3, and 20 crystalline conditions for holo Human Transferrin, archaeal exosome protein, and Nucleoside diphosphate kinase, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.