2022
DOI: 10.48550/arxiv.2205.03770
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformer-Empowered 6G Intelligent Networks: From Massive MIMO Processing to Semantic Communication

Abstract: 6G wireless networks are foreseen to speed up the convergence of the physical and cyber worlds and to enable a paradigm-shift in the way we deploy and exploit communication networks. Machine learning, in particular deep learning (DL), is going to be one of the key technological enablers of 6G by offering a new paradigm for the design and optimization of networks with a high level of intelligence. In this article, we introduce an emerging DL architecture, known as the transformer, and discuss its potential impa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…In 2017, the authors of [18] introduced a pure-attention-based DNN architecture, dubbed "transformer", to replace the prevalent RNN models for machine translation tasks. Since then, pure-attention-based DNN models are getting more and more popular in the DL community, while also finding applications in wireless communications [27].…”
Section: A Attentionnetmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2017, the authors of [18] introduced a pure-attention-based DNN architecture, dubbed "transformer", to replace the prevalent RNN models for machine translation tasks. Since then, pure-attention-based DNN models are getting more and more popular in the DL community, while also finding applications in wireless communications [27].…”
Section: A Attentionnetmentioning
confidence: 99%
“…The update rule of ( 27) is derived for the basic optimizer that performs mini-batch gradient descent. In this case, distributed learning in (28) is equivalent to (27). On the other hand, advanced optimizers, such as Adam or AdaMax [31], often take the momentum of gradients and adaptive learning rates into account.…”
Section: B Distributed Learningmentioning
confidence: 99%