“…where, w k is a parameter vector of the output layer; for i ∈ [1 : k − 1] and j ∈ [1 : l], W i and V j are parameter matrices; r i (•) and s j (•) are entry-wise activation functions of layers i and j, i.e., for a ∈ R t , r i (a) = [r i (a 1 ),...,r i (a t )] and s i (a) = [s i (a 1 ),...,s i (a t )]; and σ(•) is the sigmoid function given by σ(p) = 1/(1 + e −p ) (note that σ does not appear in the discriminator in [26,Equation (7)] as the discriminator considered in the neural net distance is not a soft classifier mapping to [0,1]). We assume that each r i (•) and s j (•) are R iand S j -Lipschitz, respectively, and also that they are positive homogeneous, i.e., r i (λp) = λr i (p) and s j (λp) = λs j (p), for any λ ≥ 0 and p ∈ R. Finally, as modelled in [26], [28]- [30], we assume that the Frobenius norms of the parameter matrices are bounded, i.e.,…”