Question Answering (QA) is an attractive and challenging area in NLP community. With the development of QA technique, plenty of QA software has been applied in daily human life to provide convenient access of information retrieval. To investigate the performance of QA software, many benchmark datasets have been constructed to provide various test cases. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases are mandatory to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this work, we propose a novel testing method, qaAskeR + , with five new Metamorphic Relations for QA software. qaAskeR + does not refer to the annotated labels of test cases. Instead, based on the idea that a correct answer should imply a piece of reliable knowledge that always conforms with any other correct answer, qaAskeR + tests QA software by inspecting its behaviors on multiple recursively asked questions that are relevant to the same or some further enriched knowledge. Experimental results show that qaAskeR + can reveal quite a few violations that indicate actual answering issues on various mainstream QA software without using any pre-annotated labels.