“…For example, this can be seen in previous work [8], [53], [55], where input volumes are reduced by distilling input sequences to their most useful elements before relaying to remote servers for semantic analysis. Other work [22], [52] mainly focused on task-specific mappings of inputs onto lower-dimensional space before training with more dataefficient models, and recent advances in domain adaptation and transfer learning [26], [39], [48] can also be used to learn compressed codes tuned to particular models. However, for any specified source distribution, domain adaptation [26], [39], [48] and other proposals mentioned above [8], [53], [55] equally compact all sampled inputs to fixed length codes, and varying degrees of entropy among input examples are ignored.…”