Alkali-silica reaction (ASR) is a type of material degradation in concrete structures that leads to concrete cracking and rebar corrosion, thereby reducing the material's structural integrity and the overall structure's lifetime and raising safety concerns. Ultrasonic nondestructive evaluation (NDE) has been proven to be a valuable technique for assessing concrete properties and monitoring ASR progression in concrete. However, the deployment and analysis of ultrasonic NDE and its data requires specialized expertise, often relying on the engineer's subjective interpretation. With the surge in computational power, artificial intelligence (AI) and machine learning (ML) algorithms have become popular in automating NDE data analysis. Various industrial sectors are increasingly adopting ML algorithms for NDE data analysis with a growing emphasis on AI-assisted automation. Regulatory agencies are also preparing for this technological shift, anticipating corresponding revisions in standards. Thus, there is an urgent need to identify the capabilities and limitations of current ML technologies for the evaluation of concrete material properties and damage status. Furthermore, the effects of various factors on ML model performance must be thoroughly investigated.The study summarized herein evaluated the effectiveness of two ML models (i.e., support vector regression (SVR) and deep neural network (DNN)) in predicting concrete material damage induced by ASR based on the long-term ultrasonic monitoring data. Four distinct concrete specimens were cast with artificially induced ASR, and over a period exceeding 500 days, ultrasonic signals and expansion data were continuously collected. For the SVR model, wave velocity and 12 other wave features were extracted from the ultrasonic signals, with 6 out of 13 features selected as input for the model. Different combinations of training and testing datasets were designed to explore factors influencing prediction performance, including the range of data within training and testing sets, in addition to various signal preprocessing methodologies. These findings suggest the importance of using a training dataset with a broader data range compared with testing datasets for improved model performance alongside consistent signal preprocessing across datasets.Additionally, this work studied the effect of temperature on model performance, revealing significant prediction errors when the temperature of testing data differed from that of training data. Furthermore, this study examined training and testing datasets from two distinct specimen batches, revealing the SVR model's superior generalization ability compared with DNN when considering data from different batches. Lastly, DNN models were trained and tested using both time-domain signals and frequency spectra. The results showed potential for achieving high regression performance (e.g., the ML model was able to predict ASR expansion with relatively high accuracy), although further efforts in model tuning and data preprocessing are still requir...