Can large language models (LLMs) create tailor-made, convincing arguments to promote false or misleading narratives online? Early work has found that LLMs can generate content perceived on par with, or even more persuasive than, human-written messages. However, there is still limited evidence regarding LLMs' persuasive capabilities in direct conversations with humans—the scenario these models are usually deployed at. In this pre-registered study, we analyze the power of AI-driven persuasion in a controlled, harmless setting. To this end, we created a web-based platform where human participants engaged in short, multi-round debates with either human or LLM opponents. Each participant was randomly assigned to one of four treatment conditions in a two-by-two factorial design: (1) the conversation partner was either another human or an LLM; (2) the conversation partner either had or did not have access to basic sociodemographic information about their opponent (and thus arguments could be personalized). We find that 64.4% of the time, personalized LLM debaters were more persuasive than humans, given that they were not equally persuasive (81.2% relative increase in the odds of higher post-debate agreement; p < 0.01; N=900). Without personalization, GPT-4 still outperformed humans, but the effect was lower and not statistically significant (p=0.30). Further, our analysis suggests that LLMs use different strategies from human debaters: their texts are harder to read and have more markers associated with logical and analytical reasoning. Overall, our results suggest that concerns around LLM-based persuasion are meaningful and have important implications for social media governance and the design of new online environments.