2021
DOI: 10.1007/s10844-021-00693-2
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchy-based semantic embeddings for single-valued & multi-valued categorical variables

Abstract: In low-resource domains, it is challenging to achieve good performance using existing machine learning methods due to a lack of training data and mixed data types (numeric and categorical). In particular, categorical variables with high cardinality pose a challenge to machine learning tasks such as classification and regression because training requires sufficiently many data points for the possible values of each variable. Since interpolation is not possible, nothing can be learned for values not seen in the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 42 publications
0
3
0
Order By: Relevance
“…Categorical variables such as species and habitat were coded as numeric variables using the target encoding approach that could estimate the probability of variables without increasing the dimensionality. 33 The multicollinearity test was conducted using SPSS v.17.0 software (IBM Corporation, Armonk, New York, USA) to examine whether there was a correlation among explanatory variables. In addition, the robustness of variable importance ranking was tested using randomized 5-fold crossvalidation with 1000 replications.…”
Section: Data Setmentioning
confidence: 99%
“…Categorical variables such as species and habitat were coded as numeric variables using the target encoding approach that could estimate the probability of variables without increasing the dimensionality. 33 The multicollinearity test was conducted using SPSS v.17.0 software (IBM Corporation, Armonk, New York, USA) to examine whether there was a correlation among explanatory variables. In addition, the robustness of variable importance ranking was tested using randomized 5-fold crossvalidation with 1000 replications.…”
Section: Data Setmentioning
confidence: 99%
“…An in-depth analysis of this is an interesting direction for future research. This could be solved by embedding the hierarchy separately, e.g., [50], or imposing restrictions on the embeddings, such as a minimum distance constraint.…”
Section: Future Workmentioning
confidence: 99%
“…An in-depth analysis of this is an interesting direction for future research. This could be solved by embedding the hierarchy separately, e.g., [49], or imposing restrictions on the embeddings, such as a minimum distance constraint.…”
Section: Future Workmentioning
confidence: 99%