Evaluation of an integrated multi-task machine learning system with humans in the loop

Steinfeld, Aaron; Bennett, Spencer; Cunningham, Kyle; Lahut, Matt; Quinones, Pablo-Alejandro; Wexler, Django; Siewiorek, Dan; Hayes, Jordan; Cohen, Paul R.; Fitzgerald, Julie C.; Hansson, Othar; Pool, Mike; Drummond, Mark

doi:10.1145/1660877.1660901

Cited by 13 publications

(11 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This section provides an overview of the test; a complete description can be found elsewhere [19]. The evaluation was designed to present participants with a challenging email overload workload that satisfied the following criteria.…”

Section: The Conference Planning Testmentioning

confidence: 99%

“…The test emails included anonymized real emails and fabricated ones, the latter necessary in part to make the emails consistent with the simulated world [19]. A team of undergraduate English majors was employed to create a detailed backstory email corpus, independent messages detailing one or more tasks, and noise messages, which were unrelated to the conference.…”

Section: The Conference Planning Testmentioning

confidence: 99%

“…An evaluation score, designed by external program evaluators, summarized overall performance into a single objective score ranging from 0.000 to 1.000, with higher scores reflecting better performance (for full details, see [10,19]). It was important that this score be tied to objective conference planning performance rather than a technologyspecific algorithm (for example, F1 for classification).…”

Section: Evaluation Scorementioning

confidence: 99%

“…A large-scale user test evaluated several versions of RADAR over three years [19]. The test measured RADAR's performance using quantitative metrics acquired through data logging, including an evaluation score that summarizes overall performance along with qualitative metrics collected with a post-test user survey [20].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Agent-assisted task management that reduces email overload

Faulring

Myers

Mohnkern

et al. 2010

Proceedings of the 15th International Conference on Intelligent User Interfaces

Self Cite

View full text Add to dashboard Cite

RADAR is a multiagent system with a mixed-initiative user interface designed to help office workers cope with email overload. RADAR agents observe experts to learn models of their strategies and then use the models to assist other people who are working on similar tasks. The agents' assistance helps a person to transition from the normal emailcentric workflow to a more efficient task-centric workflow. The Email Classifier learns to identify tasks contained within emails and then inspects new emails for similar tasks. A novel task-management user interface displays the found tasks in a to-do list, which has integrated support for performing the tasks. The Multitask Coordination Assistant learns a model of the order in which experts perform tasks and then suggests a schedule to other people who are working on similar tasks. A novel Progress Bar displays the suggested schedule of incomplete tasks as well as the completed tasks. A large evaluation demonstrated that novice users confronted with an email overload test performed significantly better (a 37% better overall score with a factor of four fewer errors) when assisted by the RADAR agents.

show abstract

Section: The Conference Planning Testmentioning

confidence: 99%

Section: The Conference Planning Testmentioning

confidence: 99%

Section: Evaluation Scorementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Agent-assisted task management that reduces email overload

Faulring

Myers

Mohnkern

et al. 2010

Proceedings of the 15th International Conference on Intelligent User Interfaces

Self Cite

View full text Add to dashboard Cite

show abstract

“…In fashion, the human component of algorithm evaluation is necessary [19,23]. Guided by this, we identify candidate applications where a fashion ontology enhanced with a better subjective data representation would likely be helpful.…”

Section: Fashion Ontology Use Casesmentioning

confidence: 99%

Extending Knowledge Graphs with Subjective Influence Networks for Personalized Fashion

Bollacker¹,

Díaz-Rodríguez²,

Li³

2018

Designing Cognitive Cities

View full text Add to dashboard Cite

This chapter shows Stitch Fix's industry case as an applied fashion application in cognitive cities. Fashion goes hand in hand with the economic development of better methods in smart and cognitive cities, leisure activities and consumption. However, extracting knowledge and actionable insights from fashion data still presents challenges due to the intrinsic subjectivity needed to effectively model the domain. Fashion ontologies help address this, but most existing such ontologies are "clothing" ontologies, which consider only the physical attributes of garments or people and often model subjective judgements only as opaque categorizations of entities. We address this by proposing a supplementary ontological approach in the fashion domain based on subjective influence networks. We enumerate a set of use cases this approach is intended to address and discuss possible classes of prediction questions and machine learning experiments that could be executed to validate or refute the model. We also present a case study on business models and monetization strategies for digital fashion, a domain that is fast-changing and gaining the battle in the digital domain.

show abstract