2023
DOI: 10.1093/sysbio/syad028
|View full text |Cite
|
Sign up to set email alerts
|

PARNAS: Objectively Selecting the Most Representative Taxa on a Phylogeny

Abstract: The use of next-generation sequencing technology has enabled phylogenetic studies with hundreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, detailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires objective subsampling of taxa. To address this need, we propose parnas, an objective and flexible algor… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 46 publications
0
5
0
Order By: Relevance
“…The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder were downloaded to create a sequence dataset. As North America and Europe were over-represented in this dataset, these were sub-sampled to maintain representative sequences using PARNAS 78 . The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder 8 and 1,000 ultrafast bootstraps 82 .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder were downloaded to create a sequence dataset. As North America and Europe were over-represented in this dataset, these were sub-sampled to maintain representative sequences using PARNAS 78 . The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder 8 and 1,000 ultrafast bootstraps 82 .…”
Section: Methodsmentioning
confidence: 99%
“…All H5N1 HPAIV clade 2.3.4.4b sequences available in the EpiFlu database between 1 st September 2020 and 22 nd January 2024 were downloaded to create a sequence dataset. As North America and Europe were over-represented in this dataset, these were sub-sampled to maintain representative sequences using PARNAS 78 . The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder were downloaded to create a sequence dataset.…”
Section: Whole-genome Sequencing and Phylogenetic Analysismentioning
confidence: 99%
“…The human seasonal H3, swine H3 1990.4.a, and swine H3 2010.1 lineage datasets were aligned separately using mafft v7.490 (33) and a maximum-likelihood tree was inferred for each alignment following automatic model selection with IQ-Tree2 v2.2.2.6 (34, 35). To representatively subsample the datasets, we used PARNAS v0.1.4 (36) with a defined number of sequences (n=75 human H3, n=10 swine H3 1990.4.a, n = 10 swine H3 2010.1). The selected sequences from these three datasets were combined with the strains used in the biological assays into one FASTA file.…”
Section: Methodsmentioning
confidence: 99%
“…Threshold distances of 0.24659 and 0.21677 were used for EATRO1125 and Lister427, respectively. Down sampling and subtree generation of representative sequences was performed using PARNAS (0.1.4) 59 . The command used for PARNAS included flag: --cover -radius <threshold distance>.…”
Section: Vsg Clustering and Family Identificationmentioning
confidence: 99%