2020
DOI: 10.48550/arxiv.2004.02441
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TraDE: Transformers for Density Estimation

Rasool Fakoor,
Pratik Chaudhari,
Jonas Mueller
et al.

Abstract: We present TraDE, an attention-based architecture for auto-regressive density estimation. In addition to a Maximum Likelihood loss we employ a Maximum Mean Discrepancy (MMD) two-sample loss to ensure that samples from the estimate resemble the training data. The use of attention means that the model need not retain conditional sufficient statistics during the process beyond what is needed for each covariate. TraDE performs significantly better than existing approaches such differentiable flow based estimators … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 31 publications
0
4
0
Order By: Relevance
“…Furthermore, these proposed future studies can be expanded from [9], which investigated better neural architecture designs for building flow-based models using self-attention for the estimator. Combined with increasing evidence in other research domains applying similar architecture [17], we expect the self-attention-based estimator to provide more expressive density estimations [7,20], where the attention mechanism could be directly augmented from flow indication embedding. We leave this research direction as future work.…”
Section: Discussionmentioning
confidence: 85%
“…Furthermore, these proposed future studies can be expanded from [9], which investigated better neural architecture designs for building flow-based models using self-attention for the estimator. Combined with increasing evidence in other research domains applying similar architecture [17], we expect the self-attention-based estimator to provide more expressive density estimations [7,20], where the attention mechanism could be directly augmented from flow indication embedding. We leave this research direction as future work.…”
Section: Discussionmentioning
confidence: 85%
“…In [24] the authors model conditional density estimators for multivariate data with conditional sum-product networks that combines tree-based structures with deep models. In [7] the authors combine the transformer model with flows for density estimation. The flow models were also applied for future prediction problems in [29].…”
Section: Related Workmentioning
confidence: 99%
“…Despite their success for modeling text, the application of Transformer architectures to tabular data remains limited [16,28,68]. The use of tabular models together with Transformer-like text architectures has also received little attention [33,53].…”
Section: A3 Featurizing Text For Tabular Modelsmentioning
confidence: 99%