Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3463116
|View full text |Cite
|
Sign up to set email alerts
|

GemNN: Gating-enhanced Multi-task Neural Networks with Feature Interaction Learning for CTR Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(11 citation statements)
references
References 46 publications
0
11
0
Order By: Relevance
“…The influence of πœ– β€² 𝑠 decaying rate on performance has been studied by [11] for channel pruning, and similar to it, the decaying rate of πœ– in ( 4) and ( 8) is not much important for offline feature selection, and our experiments found that the final value of πœ– between 1𝑒 βˆ’ 4 and 1𝑒 βˆ’8 has no significant effect on the performance of the model. The key component of the gate function (8) for LPFS++ is the second arctangent function part and its balancing factor 𝛼. For simplicity, we will focus on 𝛼.…”
Section: Ablation Studies and Experimental Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…The influence of πœ– β€² 𝑠 decaying rate on performance has been studied by [11] for channel pruning, and similar to it, the decaying rate of πœ– in ( 4) and ( 8) is not much important for offline feature selection, and our experiments found that the final value of πœ– between 1𝑒 βˆ’ 4 and 1𝑒 βˆ’8 has no significant effect on the performance of the model. The key component of the gate function (8) for LPFS++ is the second arctangent function part and its balancing factor 𝛼. For simplicity, we will focus on 𝛼.…”
Section: Ablation Studies and Experimental Analysismentioning
confidence: 99%
“…The main limitation for function (8), from the perspective of dimensional analysis, is that it is not scale-invariant. The π‘₯ in the arctangent trigonometric function arctan means that π‘₯ must be dimensionless.…”
Section: Conclusion Limitation and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…MMoE has been selected as the feature extractor for ESCM 2 and its competitors. It would also make sense to replace it with more powerful models such as AITM [23] and GemNN [3]. For fairness consideration, all models are trained for 300k iterations with Adam [10] optimizer and the same set of hyperparameters to make results comparable.…”
Section: Training Protocolmentioning
confidence: 99%
“…A hierarchical model with micro- and macro-behaviour ( HM 3 ) [ 12 ] is proposed, which utilizes multitask learning and applies the abundant supervisory labels from micro- and macro-behaviours to predict conversion rate (CVR) in a unified framework. Gating-enhanced multitask neural network (Gem-NN) [ 13 ] is proposed to predict CTR in a coarse-to-fine manner, which allows parameter sharing from upper-level tasks to lower-level tasks and introduces a gating mechanism between embedding layers and MLP. The multiple-level sparse sharing model (MSSM) [ 14 ] is proposed to represent feature flexibly and share information among tasks efficiently, which include a field-level sparse connection module (FSCM) and a cell-level sparse sharing module (CSSM).…”
Section: Introductionmentioning
confidence: 99%