For some problems, it is difficult for humans to judge the goodness of AI-proposed solutions. Irving, Christiano, and Amodei (2018) propose that in such cases, we may use a debate between two AI systems to assist the human judge to select a good answer. We introduce a mathematical framework for modelling this type of debate and propose that the quality of debate designs may be measured by the accuracy of the most persuasive answer. We describe a simple instance of the debate framework called feature debate and analyze the degree to which such debates track the truth. We argue that despite being very simple, feature debates capture many aspects of practical debates such as the incentives to confuse the judge or stall to prevent losing. We analyze two special types of debates, those where arguments constitute independent evidence about the topic, and those where the information bandwidth of the judge is limited. * We are grateful to Chris van Merwijk, Lukas Finnveden, Michael Cohen, and Michael Dennis for feedback and discussions related to this text. The first author was partially supported by MEYS funded project CZ.02.1.01/0.0/0.0/16 019/0000765 "Research Center for Informatics".