2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2019
DOI: 10.1109/cvprw.2019.00360
|View full text |Cite
|
Sign up to set email alerts
|

SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data

Abstract: Tabular data is the most commonly used form of data in industry according to a Kaggle ML and DS Survey. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data. DNN models using categorical embeddings are also applied in this task, but all attempts thus far have used one-dimensional embeddings. The recent work of Super Characters method using twodimensional word embeddings achieved state-of-the-art results in text clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 44 publications
(32 citation statements)
references
References 24 publications
0
32
0
Order By: Relevance
“…Buturović et al designed a tabular-data-to-graphical mapping in which each feature vector is treated as a kernel, which is then applied to an arbitrary base image [17]. Sun et al experimented using pretrained production-level CNN models implementing a diametrically opposite approach consisting of projecting the literal value of the features graphically onto an image; for example, if a feature has a value of 0.2 for a given participant in the sample, the image would include the actual number 0.2 on it [18].…”
Section: Multimodal Codex Sequencementioning
confidence: 99%
“…Buturović et al designed a tabular-data-to-graphical mapping in which each feature vector is treated as a kernel, which is then applied to an arbitrary base image [17]. Sun et al experimented using pretrained production-level CNN models implementing a diametrically opposite approach consisting of projecting the literal value of the features graphically onto an image; for example, if a feature has a value of 0.2 for a given participant in the sample, the image would include the actual number 0.2 on it [18].…”
Section: Multimodal Codex Sequencementioning
confidence: 99%
“…We also utilized one of the most recently developed automated ML (Au-toML) [28] algorithms, the AutoGluon [29] Python library package, to find the best predictive ML classification models with our dataset. For DL, we employed two DL classification models proposed for tabular-formed dataset: SuperTML [30] and TabNet [31]. We also provide their backgrounds in Appendix A.2.…”
Section: And DL Algorithm Settingsmentioning
confidence: 99%
“…Although numerous developed AutoML packages exist, we utilized the latest and best performing AutoGluon [29] library package. • SuperTML: proposed by Sun et al [30], SuperTML suggested a new way to deal with classification problems using tabular data with deep neural networks by embedding each instance's features into a two-dimensional image. It then uses a pretrained convolutional neural network (CNN) [54], consisting of residual networks (ResNet) [2], to extract a representation of the images, after which fully connected layers (with two hidden layers) classify the input.…”
mentioning
confidence: 99%
“…Since a feature's position in the table, contrary to pixels in an image, carries no meaning, CNNs are not applicable to tabular data out of the box. Works attempting this have shown underwhelming results: their performance is "no better than SOTA" [3] or XGboost [5].…”
Section: Introductionmentioning
confidence: 99%