This paper describes a three-stage acoustic echo cancellation (AEC) and suppression framework for the ICASSP 2021 AEC Challenge. In the first stage, a partitioned block frequency domain adaptive filtering is implemented to cancel the linear echo components without introducing the near-end speech distortion, where we compensate the time delay between the far-end reference signal and the microphone signal beforehand. In the second stage, a deep complex U-Net integrated with gated recurrent unit is proposed to further suppress the residual echo components. In the last stage, an extremely tiny deep complex U-Net is trained to suppress non-speech residual components that have not been suppressed completely in the second stage, which can also further increase the echo return loss enhancement (ERLE) without increasing the computational complexity dramatically. Experimental results show that the proposed three-stage framework can get the ERLE higher than 50 dB in both single-talk and double-talk scenarios, and perceptual evaluation of speech quality can be improved about 0.75 in double-talk scenarios. The proposed framework outperforms the AEC-Challenge baseline ResRNN by 0.12 points in terms of the MOS.