Despite recent improvements in microscope technologies, segmenting and tracking cells in three-dimensional time-lapse images (3D + T images) to extract their dynamic positions and activities remains a considerable bottleneck in the field. We developed a deep learning-based software pipeline, 3DeeCellTracker, by integrating multiple existing and new techniques including deep learning for tracking. With only one volume of training data, one initial correction, and a few parameter changes, 3DeeCellTracker successfully segmented and tracked ~100 cells in both semi-immobilized and ‘straightened’ freely moving worm's brain, in a naturally beating zebrafish heart, and ~1000 cells in a 3D cultured tumor spheroid. While these datasets were imaged with highly divergent optical systems, our method tracked 90–100% of the cells in most cases, which is comparable or superior to previous results. These results suggest that 3DeeCellTracker could pave the way for revealing dynamic cell activities in image datasets that have been difficult to analyze.
The importance of thorough analyses of the secondary structures in proteins as basic structural units cannot be overemphasized. Although recent computational methods have achieved reasonably high accuracy for predicting secondary structures from amino acid sequences, a simple and fundamental empirical approach to characterize the amino acid composition of secondary structures was performed mainly in 1970s, with a small number of analyzed structures. To extend this classical approach using a large number of analyzed structures, here we characterized the amino acid sequences of secondary structures (12 154 alpha-helix units, 4592 3(10)-helix units, 16 787 beta-strand units, and 30 811 "other" units), using the representative three-dimensional protein structure records (1641 protein chains) from the Protein Data Bank. We first examined the length and the amino acid compositions of secondary structures, including rank order differences and assignment relationships among amino acids. These compositional results were largely, but not entirely, consistent with the previous studies. In addition, we examined the frequency of 400 amino acid doublets and 8000 triplets in secondary structures based on their relative counts, termed the availability. We identified not only some triplets that were specific to a certain secondary structure but also so-called zero-count triplets, which did not occur in a given secondary structure at all, even though they were probabilistically predicted to occur several times. Taken together, the present study revealed essential features of secondary structures and suggests potential applications in the secondary structure prediction and the functional design of protein sequences.
One of the important secondary structures in proteins is the β-strand. However, due to its complexity, it is less characterized than helical structures. Using the 1641 representative three-dimensional protein structure data from the Protein Data Bank, we characterized β-strand structures based on strand length and amino acid composition, focusing on differences between parallel and antiparallel β-strands. Antiparallel strands were more frequent and slightly longer than parallel strands. Overall, the majority of β-sheets were antiparallel sheets; however, mixed sheets were reasonably abundant, and parallel sheets were relatively rare. Notably, the nonpolar, aliphatic hydrocarbon amino acids, valine, isoleucine, and leucine were observed at a high frequency in both strands but were more abundant in parallel than in antiparallel strands. The relative amino acid occurrence in β-sheets, especially in parallel strands, was highly correlated with amino acid hydrophobicity. This correlation was not observed in α-helices and 3(10)-helices. In addition, we examined the frequency of 400 amino acid doublets and 8000 amino acid triplets in β-strands based on availability, a measurement of the relative counts of the doublets and triplets. We identified some triplets that were specifically found in either parallel or antiparallel strands. We further identified "zero-count triplets" which did not occur in either parallel or antiparallel strands, despite the fact that they were probabilistically supposed to occur several times. Taken together, the present study revealed essential features of β-strand structures and the differences between parallel and antiparallel β-strands, which can potentially be applied to the secondary structure prediction and the functional design of protein sequences in the future.
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or “words”. We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and “compressed” English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., “key words”) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.