The problem of multiclass scheduling in a dynamic wireless setting is considered here, where the available limited bandwidth resources are allocated to handle random service demand arrivals, belonging to different classes in terms of payload data request, delay tolerance, and importance/priority. In addition to heterogeneous traffic, another major challenge stems from random service rates due to time-varying wireless communication channels. Existing scheduling and resource allocation approaches, ranging from simple greedy heuristics and constrained optimization to combinatorics, are tailored to specific network or application configuration and are usually suboptimal. On this account, we resort to deep reinforcement learning (DRL) and propose a distributional Deep Deterministic Policy Gradient (DDPG) algorithm combined with Deep Sets to tackle the aforementioned problem. Furthermore, we present a novel way to use a Dueling Network, which leads to further performance improvement. Our proposed algorithm is tested on both synthetic and real data, showing consistent gains against baseline methods from combinatorics and optimization, and stateof-the-art scheduling metrics. Our method can, for instance, achieve with 13% less power and bandwidth resources the same user satisfaction rate as a myopic algorithm using knapsack optimization.