Intelligent Tutoring Systems (ITSs) have a great potential to effectively transform teaching and learning. As more efforts have been put on designing and developing ITSs and integrating them within learning and instruction, mixed types of results about the effectiveness of ITS have been reported. Therefore, it is necessary to investigate how ITSs work in real and natural educational contexts and the associated challenges of ITS application and evaluation. Through a systematic literature review method, this study analyzed 40 qualified studies that applied social experiment methods to examine the effectiveness of ITS during 2011–2022. The obtained results highlighted a complicated landscape regarding the effectiveness of ITS in real educational contexts. Specifically, there was an “intelligent” regional gap regarding the distribution of countries where ITS studies using social experiment methods were conducted. Compared to learning performance, relatively less attention was paid to investigating the impact of ITS on non-cognitive factors, process-oriented factors, and social outcomes, calling for more research in this regard. Considering the complexities and challenges existing in real educational fields, there was a lack of scientific rigor in terms of experimental design and data analysis in some of the studies. Based on these findings, suggestions for future study and implications were proposed.