Many important uses of AI involve augmenting humans, not replacing them. But there is not yet a widely used and broadly comparable test for evaluating the performance of these human-AI systems relative to humans alone, AI alone, or other baselines. Here we describe such a test and demonstrate its use in three ways. First, in an analysis of 79 recently published results, we find that, surprisingly, the median performance improvement ratio corresponds to no improvement at all, and the maximum improvement is only 36%. Second, we experimentally find a 27% performance improvement when 100 human programmers develop software using GPT-3, a modern, generative AI system. Finally, we find that 50 human non-programmers using GPT-3 perform the task about as well as –- and less expensively than –- the human programmers. Since neither the non-programmers nor the computer could perform the task alone, this illustrates a strong form of human-AI synergy.