Abstract:The CADE ATP System Computer (CASC) evaluates the performance of sound, fully automatic, classical logic, ATP systems. The evaluation is in terms of the number of problems solved, the number of acceptable proofs and models produced, and the average runtime for problems solved, in the context of a bounded number of eligible problems chosen from the TPTP problem library, and a specified time limit for each solution attempt. The CADE-22 ATP System Competition (CASC-22) was held on 5th August 2009. The design of t… Show more
“…In SMT-COMP 2016 there were 603 conflicts (solvers returning different results) on 73 benchmarks caused by three solvers giving incorrect results for various reasons. 5 In the CASC competition [25], there is a period of testing where soundness is checked and resolved, and there have been a number of solvers later disqualified from the competition due to unsoundness. In our experience, adding a new feature to a theorem prover is a highly complex task and it is easy to introduce unsoundness, or general incorrectness, especially in areas of the code that are encountered during proof search infrequently.…”
Abstract. This paper attempts to address the question of how best to assure the correctness of saturation-based automated theorem provers using our experience with developing the theorem prover Vampire. We describe the techniques we currently employ to ensure that Vampire is correct and use this to motivate future challenges that need to be addressed to make this process more straightforward and to achieve better correctness guarantees.
“…In SMT-COMP 2016 there were 603 conflicts (solvers returning different results) on 73 benchmarks caused by three solvers giving incorrect results for various reasons. 5 In the CASC competition [25], there is a period of testing where soundness is checked and resolved, and there have been a number of solvers later disqualified from the competition due to unsoundness. In our experience, adding a new feature to a theorem prover is a highly complex task and it is easy to introduce unsoundness, or general incorrectness, especially in areas of the code that are encountered during proof search infrequently.…”
Abstract. This paper attempts to address the question of how best to assure the correctness of saturation-based automated theorem provers using our experience with developing the theorem prover Vampire. We describe the techniques we currently employ to ensure that Vampire is correct and use this to motivate future challenges that need to be addressed to make this process more straightforward and to achieve better correctness guarantees.
“…[13], often combined with an SS-portfolio approach. Leading competition versions of solvers for the "main" divisions of the first-order logic theorem proving competition CASC [27] namely E [24], iProver [16] and Vampire [18] are all SS-portfolio solver instances. E subsequently runs several different superposition strategies found by a machine learning approach.…”
Section: Participation Of Portfolios 2016mentioning
I discuss the question whether portfolio solvers support advances in automated reasoning. A portfolio solver is the combination of a collection of core solvers. I distinguish syntactic combinations from semantic combinations and argue that the former are useful for competitions where the latter foster progress in automated reasoning.
“…The SAT [1], SMT-COMP [2] and CASC [21] competitions respectively focus on comparing SAT solvers, SMT solvers and automated theorem provers (ATPs). Each of them takes benefit of uniform common formats supported by every contestant tool (e.g.…”
This short paper presents a compilation of feedback about online runtime verification competitions from an active contestant. In particular, it points out several issues and how they could possibly be fixed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.