This paper presents a new formulation of input‐constrained optimal output synchronization problem and proposes an observer‐based distributed optimal control protocol for discrete‐time heterogeneous multiagent systems with input constraints via model‐free reinforcement learning. First, distributed adaptive observers are designed for all agents to estimate the leader's trajectory without requiring its dynamics knowledge. Subsequently, the optimal control input associated with the optimal value function is derived based on the solution to the tracking Hamilton‐Jacobi‐Bellman equation, which is always difficult to solve analytically. To this end, motivated by reinforcement learning technique, a model‐free Q‐learning policy iteration algorithm is proposed, and the actor‐critic neural network structure is implemented to iteratively find the optimal tracking control input without knowing system dynamics. Moreover, inputs of all agents are constrained in the permitted bounds by inserting a nonquadratic function into the performance function, where input constraints are encoded into the optimization problem. Finally, a numerical simulation example is provided to illustrate the effectiveness of the proposed theoretical results.