Federated learning (FL) with over-the-air computation can efficiently utilize the communication bandwidth but is susceptible to analog aggregation error. Excluding those devices with weak channel conditions can reduce the aggregation error, but it also limits the amount of local training data for FL, which can reduce the training convergence rate. In this work, we jointly design uplink receiver beamforming and device selection for over-the-air FL over time-varying wireless channels to maximize the training convergence rate. We reformulate this stochastic optimization problem into a mixed-integer program using an upper bound on the global training loss over communication rounds. We then propose a Greedy Spatial Device Selection (GSDS) approach, which uses a sequential procedure to select devices based on a measure capturing both the channel strength and the channel correlation to the selected devices. We show that given the selected devices, the receiver beamforming optimization problem is equivalent to downlink single-group multicast beamforming. To reduce the computational complexity, we also propose an Alternating-optimization-based Device Selection and Beamforming (ADSBF) approach, which solves the receiver beamforming and device selection subproblems alternatingly. In particular, despite the device selection being an integer problem, we are able to develop an efficient algorithm to find its optimal solution. Simulation results with real-world image classification demonstrate that our proposed methods achieve faster convergence with significantly lower computational complexity than existing alternatives. Furthermore, although ADSBF shows marginally inferior performance to GSDS, it offers the advantage of lower computational complexity when the number of devices is large.INDEX TERMS Federated learning, over-the-air aggregation, device selection, receiver beamforming.