“…It can be particularly useful to represent the task not just as a sample from this latent variable, but also as the full distribution of the latent variable, e.g., its mean and variance, in order to capture uncertainty about the task [281]. Additionally, Rakelly et al [185] make use of the Markov property to represent the task distribution conditioned on D as a product of individual distributions conditioned on each transition in D. More generally, this entails the exchangeability, or permutation invariance, of the transitions, which has been exploited by other meta-RL methods as well [67,16,110,97,162,241], using representations such as neural processes [71] and transformers [226]. Moreover, permutation invariance at the level of episodes has also been used [116].…”