Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference

Liao, Shun; Chen, Ting; Lin, Tian; Zhou, Denny; Wang, Chong

doi:10.48550/arxiv.1901.10668

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, its embeddings are better than other models for sequences belonging to that subset but it also is not ignorant of the rest of the space. Such hybrid approaches have been used previously in machine learning ( Peralta et al 2019 , Liao et al 2019 ).…”

Section: Methodsmentioning

confidence: 99%

Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach

Jiang,

McDonald,

Perry

et al. 2024

Bioinformatics

View full text Add to dashboard Cite

Motivation Phylogenetic placement of a query sequence on a backbone tree is increasingly used across biomedical sciences to identify the content of a sample from its DNA content. The accuracy of such analyses depends on the density of the backbone tree, making it crucial that placement methods scale to very large trees. Moreover, a new paradigm has been recently proposed to place sequences on the species tree using single-gene data. The goal is to better characterize the samples and to enable combined analyses of marker-gene (e.g., 16S rRNA gene amplicon) and genome-wide data. The recent method DEPP enables performing such analyses using metric learning. However, metric learning is hampered by a need to compute and save a quadratically growing matrix of pairwise distances during training. Thus, the training phase of DEPP does not scale to more than roughly 10 000 backbone species, a problem that we faced when trying to use our recently released Greengenes2 (GG2) reference tree containing 331 270 species. Results This paper explores divide-and-conquer for training ensembles of DEPP models, culminating in a method called C-DEPP. While divide-and-conquer has been extensively used in phylogenetics, applying divide-and-conquer to data-hungry machine-learning methods needs nuance. C-DEPP uses carefully crafted techniques to enable quasi-linear scaling while maintaining accuracy. C-DEPP enables placing 20 million 16S fragments on the GG2 reference tree in 41 h of computation. Availability and implementation The dataset and C-DEPP software are freely available at https://github.com/yueyujiang/dataset_cdepp/.

show abstract

Section: Methodsmentioning

confidence: 99%