Analysis of viral evolution is a key element of epidemiological surveillance and control. One of the fundamental tools which is widely used to illustrate evolutionary history is the phylogenetic tree. Recently, we have proposed an alternative visualization for the phylogenetic tree using the evolutionary trajectory of its taxa. An evolutionary trajectory is a path starting from a taxon and ending at the root of the tree. In this paper, we propose an embedding of tree nodes by encoding their genetic sequence using a reduced amino acid alphabet and employing the Word2Vec framework. The suggested visualization maintains the phylogenetic relationship between nodes, while their proximity in 3D space depends on three factors: the type of reduced amino acid alphabet; fixed-length genetic patterns used in Word2Vec; and the neighbor effect of adjacent signatures. The results of our experiments showed that the majority of evolutionary history can be described in the embedded space. Moreover, they suggest potential application of our approach as an explanatory tool in studying various aspects: evolutionary dynamics; evolutionary deviation of viral variants; and phylogenetic characteristics, such as formation of new clades. Besides the usual local analysis of point mutations, the developed framework enables studying these aspects based on a more comprehensive global context, including neighboring effects, genetic signatures.
Currently, vaccination is one of the most efficient ways to control and prevent influenza infection. Vaccine production largely relies on the results of laboratory assays, including hemagglutination inhibition and microneutralization assays, which are time-consuming and laborious. Viruses can escape from the immune response that results in the need to revise and update vaccines biannually. The hemagglutination inhibition assay can measure how effectively antibodies against a reference strain bind and block an antigen of the test strain. Various computer-aided models have been developed to optimize candidate vaccine strain selection. A general problem in modeling of antigenic evolution is the representation of genetic sequences for input into the research model. Our motivation stems from the well-known problem of encoding genetic information for modeling antigenic evolution. This paper introduces a two-fold encoding approach based on reduced amino acid alphabet and amino acid index databases called AAindex. We propose to apply a simplified amino acid alphabet in modeling of antigenic evolution. A simplified alphabet, also called a sub-alphabet or reduced amino acid alphabet, implies to use the 20 amino acids being clustered and divided into amino acid groups. The proposed encoding allows to redefine mutations termed for amino acid groups located in reduced alphabets. We investigated 40 reduced amino acid sets and their performance in modeling antigenic evolution. The experimental results indicate that the proposed reduced amino acid alphabets can achieve the performance of the standard alphabet in its accuracy. Moreover, these alphabets provide deeper insight into various aspects of the relationship between mutation and antigenic variation. By checking identified high-impact sites in the Influenza Research Database, we found that not only antigenic sites have a significant influence on antigenicity, but also other amino acids located in close proximity. The results indicate that all selected non-antigenic sites are related to immune responses. According to the Influenza Research Database, these have been experimentally determined to be T-cell epitopes, B-cell epitopes, and MHC-binding epitopes of different classes. This highlighted a caveat: while simulating antigenic evolution, the model should consider not only the genetic information on antigenic sites, but also that of neighboring positions, as they may indirectly impact antigenicity. Additionally, our findings indicate that structural and charge characteristics are the most beneficial in modeling antigenic evolution, which is in agreement with previous studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.