“…Following [21], the reward function is inversely interpreted as a loss function: (s t , a t ) = −r (s t , a t ). Assuming a privacy signal f (s t , a t ) and an electricity cost signal g(s t , a t ), the one-step loss function can be defined as follows: (8) where λ ∈ [0, 1] controls the privacy-cost trade-off. Concretely, for λ = 0 the goal of the agent will be to minimize the expected cumulative privacy signal, while for λ = 1 it will be to minimize the expected cumulative energy cost.…”