Machine Learning (ML) is highly data dependent. But all data contain noise. Especially for chaotic systems, due to the famous “butterfly-effect”, small disturbances can grow exponentially to the same order of magnitude as the physical solution so that numerical simulations of chaos become badly polluted. A fundamental and open problem is how data noise in chaotic systems influences short and long-term predictions of ML based on such badly polluted data. Ultra-chaos, whose statistics are sensitive to small disturbances, is a new concept in chaos theory. In this paper, the influence of data noise on ML of chaos is investigated in detail by comparing predictions with three ML models based on a “clean” dataset given by Clean Numerical Simulation (CNS), and a “badly polluted” dataset given by the traditional Runge–Kutta method. It is found that neither data noise nor ML prediction deviation influence the long-term statistics of ML predictions for normal-chaos. However, data noise has a significant influence on the statistics of ML predictions for ultra-chaos: this might pose a great challenge for ML that is a part of AI.