A Data-Driven Metric of Hardness for WSC Sentences

Isaak, Nicos; Michael, Loizos

doi:10.29007/398z

Cited by 4 publications

(18 citation statements)

References 5 publications

(7 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1). Contributors are allowed to take on this second role if they meet two requirements: first, the percentage of their valid and approved (by other Evaluators) schemas among those that they have contributed that far exceeds a certain threshold (which we have set to be 90%, corresponding to the bar for near adult human abilities on the WSC [3]); second, their score (which we discuss later) is above a certain other threshold. Contributors who are also Evaluators choose the role in which they interact with WinoFlexi at login time.…”

Section: Contributing and Evaluatingmentioning

confidence: 99%

“…Towards this goal, we follow a single-step approach for labeling schemas with a hardness score which indirectly shows if a schema is considered hard to answer by a machine; Winograd schemas are accordingly labeled as such by the computed hardness index. For this purpose we use a recent tool [3] that can take any Winograd schema and output a score that shows its hardness index. The hardness index is presented to the Contributors and the Evaluators.…”

Section: Un-validated Schemasmentioning

confidence: 99%

“…Hardness Metric Tool: For the purpose of this experiment, we randomly selected 57 schema-halves of the WinoFlexi-library, and compared their hardness index to that of 57 schema-halves of the Winograd-library taken from a previous work [3]. Fig.…”

Section: Quantitative Analysismentioning

confidence: 99%

“…Unlike the Turing Test, which is based on short free-form conversations during which a machine attempts to imitate a human, machines passing the WSC are expected to demonstrate the ability to think without having to pretend to be somebody else [1]. Passing the challenge requires resolving pronouns in certain sentences where shallow parsing techniques seem not to be directly applicable, and where the use of world knowledge and the ability to reason seem necessary [2,3]. Although the challenge is, by design, easy for humans, the development of new Winograd schemas is, itself, too troublesome for humans lacking inspiration and creativity [4].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

WinoFlexi: A Crowdsourcing Platform for the Development of Winograd Schemas

Isaak

Michael

2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

The Winograd Schema Challenge, the task of resolving pronouns in certain carefully-structured sentences, has received considerable interest in the past few years as an alternative to the Turing Test. Systems developed to tackle this challenge have typically been evaluated on a small set of hand-crafted collections of sentences, since the development of new sentences by individuals is itself a rather challenging task, requiring care and creativity. In this paper we approach the problem of developing Winograd schemas via the introduction of WinoFlexi, a flexible online crowdsourcing system. Our empirical evaluation of the system's performance suggests that WinoFlexi allows crowdworkers to develop Winograd schemas of quality similar to that of most typical existing collections.

show abstract

Section: Contributing and Evaluatingmentioning

confidence: 99%

Section: Un-validated Schemasmentioning

confidence: 99%

Section: Quantitative Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

WinoFlexi: A Crowdsourcing Platform for the Development of Winograd Schemas

Isaak

Michael

2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…On a second front, we expect that the adoption and use of WSC-based CAPTCHAs will encourage more AI researchers to work on the problem of actually trying to solve the WSC, and perhaps, in the process, help towards the building of machines able to reason with commonsense knowledge. At the same time, it will also present AI researchers with the novel challenge of automating the construction of new WSC instances, or evaluating how hard they might be to humans (as pursued, for example, in [13]).…”

Section: Discussionmentioning

confidence: 99%

Using the Winograd Schema Challenge as a CAPTCHA

Isaak¹,

Michael²

EPiC Series in Computing

Self Cite

View full text Add to dashboard Cite

CAPTCHAs have established themselves as a standard technology to confidently distinguish humans from bots. Beyond the typical use for security reasons, CAPTCHAs have helped promote AI research in challenge tasks such as image classification and optical character recognition. It is, therefore, natural to consider what other challenge tasks for AI could serve a role in CAPTCHAs. The Winograd Schema Challenge (WSC), a certain form of hard pronoun resolution tasks, was proposed by Levesque as such a challenge task to promote research in AI. Based on current reports in the literature, the WSC remains a challenging task for bots, and is, therefore, a candidate to serve as a form of CAPTCHA. In this work we investigate whether this a priori appropriateness of the WSC as a form of CAPTCHA can be justified in terms of its acceptability by the human users in relation to existing CAPTCHA tasks. Our empirical study involved a total of 329 students, aged between 11 and 15, and showed that the WSC is generally faster and easier to solve than, and equally entertaining with, the most typical existing CAPTCHA tasks.

show abstract

Experience and prediction: a metric of hardness for a novel litmus test

Isaak

Michael

2021

Journal of Logic and Computation

Self Cite

View full text Add to dashboard Cite

In the past decade, the Winograd schema challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this hardness metric could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on machine learning, able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.

show abstract

A Data-Driven Metric of Hardness for WSC Sentences

Cited by 4 publications

References 5 publications

WinoFlexi: A Crowdsourcing Platform for the Development of Winograd Schemas

WinoFlexi: A Crowdsourcing Platform for the Development of Winograd Schemas

Using the Winograd Schema Challenge as a CAPTCHA

Experience and prediction: a metric of hardness for a novel litmus test

Contact Info

Product

Resources

About