Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. To be used effectively, an aggregator must be able to communicate the available flexibility of the loads they control, as known as the aggregate flexibility to a system operator. However, most of existing aggregate flexibility measures often are slow-timescale estimations and much less attention has been paid to real-time coordination between an aggregator and an operator. In this paper, we consider solving an online optimization in a closed-loop system and present a design of real-time aggregate flexibility feedback, termed the maximum entropy feedback (MEF). In addition to deriving analytic properties of the MEF, combining learning and control, we show that it can be approximated using reinforcement learning and used as a penalty term in a novel control algorithm -the penalized predictive control (PPC), which modifies vanilla model predictive control (MPC). The benefits of our scheme are (1). Efficient Communication. An operator running PPC does not need to know the exact states and constraints of the loads, but only the MEF. (2). Fast Computation. The PPC often has much less number of variables than an MPC formulation. (3). Lower Costs We show that under certain regularity assumptions, the PPC is optimal. We illustrate the efficacy of the PPC using a dataset from an adaptive electric vehicle charging network and show that PPC outperforms classical MPC.