Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.
Mutations in the Vacuolar protein sorting 35 (VPS35) gene have been linked to familial Parkinson’s disease (PD), PARK17. VPS35 is a key component of the retromer complex, which plays a central role in endosomal trafficking. However, whether and how VPS35 deficiency or mutation contributes to PD pathogenesis remain unclear. Here, we analyzed human induced pluripotent stem cell (iPSC)-derived neurons from PD patients with the VPS35 D620N mutation and addressed relevant disease mechanisms. In the disease group, dopaminergic (DA) neurons underwent extensive apoptotic cell death. The movement of Rab5a- or Rab7a-positive endosomes was slower, and the endosome fission and fusion frequencies were lower in the PD group than in the healthy control group. Interestingly, vesicles positive for cation-independent mannose 6-phosphate receptor transported by retromers were abnormally localized in glial cells derived from patient iPSCs. Furthermore, we found α-synuclein accumulation in TH positive DA neurons. Our results demonstrate the induction of cell death, endosomal dysfunction and α -synuclein accumulation in neural cells of the PD group. PARK17 patient-derived iPSCs provide an excellent experimental tool for understanding the pathophysiology underlying PD.
Remarkable advances in high-throughput sequencing have resulted in rapid data accumulation, and analyzing biological (DNA/RNA/protein) sequences to discover new insights in biology has become more critical and challenging. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention, because biological sequences are regarded as sentences and k-mers in these sequences as words. Embedding is an essential step in NLP, which converts words into vectors. This transformation is called representation learning and can be applied to biological sequences. Vectorized biological sequences can be used for function and structure estimation, or as inputs for other probabilistic models. Given the importance and growing trend in the application of representation learning in biology, here, we review the existing knowledge in representation learning for biological sequence analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.