Connected and autonomous vehicles (CAVs) are largely at the experimental stage. Their successful deployment and field implementation require a careful consideration of their vulnerabilities to cyberattacks. The primary security vulnerability is in the controller area network (CAN) protocol, which permits communication among electronic control units in CAVs. To address this vulnerability and mitigate cyberattacks, machine learning (ML) algorithms can be developed for intrusion detection in CAV's CAN protocol. In this research, the data structure of certain experimental datasets on message injection attack from the Hacking and Countermeasure Research Lab is examined. A random forest classifier-based ML model is developed owing to its efficiency in predicting cyberattacks on CAVs consisting of over 3 million datasets. A number of procedures within the Python programming environment are employed to clean the dataset before performing the prediction. The prediction for intrusion detection is performed with a 70:30 split of the training: testing data with a random state of 11 and number of estimators as 200. The accuracy is found to be over 92% for all three scenarios in performing the prediction. The model can be deployed in real-time investigation of cyberattacks in CAVs if real-time data were available. The data cleaning method developed in this study can be applied in other ML applications consisting of large datasets, such as credit card fraud and drug discovery, to name a few.