Anup Chalamalla scite author profile

Ilyas

Ouzzani

et al. 2014

Data cleaning techniques usually rely on some quality rules to identify violating tuples, and then fix these violations using some repair algorithms. Oftentimes, the rules, which are related to the business logic, can only be defined on some target report generated by transformations over multiple data sources. This creates a situation where the violations detected in the report are decoupled in space and time from the actual source of errors. In addition, applying the repair on the report would need to be repeated whenever the data sources change. Finally, even if repairing the report is possible and affordable, this would be of little help towards identifying and analyzing the actual sources of errors for future prevention of violations at the target. In this paper, we propose a system to address this decoupling. The system takes quality rules defined over the output of a transformation and computes explanations of the errors seen on the output. This is performed both at the target level to describe these errors and at the source level to prescribe actions to solve them. We present scalable techniques to detect, propagate, and explain errors. We also study the effectiveness and efficiency of our techniques using the TPC-H Benchmark for different scenarios and classes of quality rules.

Automatically Extracting Dialog Models from Conversation Transcripts

Negi

Joshi

et al. 2009

Abstract-There is a growing need for task-oriented natural language dialog systems that can interact with a user to accomplish a given objective. Recent work on building task-oriented dialog systems have emphasized the need for acquiring taskspecific knowledge from un-annotated conversational data. In our work we acquire task-specific knowledge by defining subtask as the key unit of a task-oriented conversation. We propose an unsupervised, apriori like algorithm that extracts the subtasks and their valid orderings from un-annotated humanhuman conversations. Modeling dialogues as a combination of sub-tasks and their valid orderings easily captures the variability in conversations. It also provides us the ability to map our dialogue model to AIML constructs and therefore use off-the-shelf AIML interpreters to build task-oriented chatbots. We conduct experiments on real world data sets to establish the effectiveness of the sub-task extraction process. We codify the extracted sub-tasks in an AIML knowledge base and build a chatbot using this knowledge base. We also show the usefulness of the chatbot in automatically handling customer requests by performing a user evaluation study.

Identification of class specific discourse patterns

Negi

Subramaniam

et al. 2008

In this paper we address the problem of extracting important (and unimportant) discourse patterns from call center conversations. Call centers provide dialog based calling-in support for customers to address their queries, requests and complaints. A Call center is the direct interface between an organization and its customers and it is important to capture the voice-of-customer by gathering insights into the customer experience. We have observed that the calls received at a call center contain segments within them that follow specific patterns that are typical of the issue being addressed in the call. We present methods to extract such patterns from the calls. We show that by aggregating over a few hundred calls, specific discourse patterns begin to emerge for each class of calls. Further, we show that such discourse patterns are useful for classifying calls and for identifying parts of the calls that provide insights into customer behaviour.

RAD: A Scalable Framework for Annotator Development

Khaitan

Ramakrishnan

Joshi

et al. 2008

Abstract-Developments in semantic search technology have motivated the need for efficient and scalable entity annotation techniques. We demonstrate RAD: a tool for Rapid Annotator Development on a document collection. RAD builds on a recent approach [1] that translates entity annotation rules into equivalent operations on the inverted index of the collection, to directly generate an annotation index (which can be used in search applications). To make the framework scalable, we use an industrial strength indexer, Lucene [2] and introduce some modifications to its API.The index also serves as a suitable representation for making quick comparisons with an indexed ground truth of annotations on the same collection to evaluate precision and recall of the annotations. RAD achieves at least an order of magnitude speedup over the standard approach of annotating a document-at-a-time as adopted by GATE [3]. The speedup factor increases with increase in the size of the collection, making RAD scalable. We cache intermediate results from the index operations, enabling quick update of the annotation index as well as speedy evaluation when rules are modified. This makes RAD suitable for rapid and interactive development of annotators.

Exploiting context to detect sensitive information in call center conversations

Faruquie

Negi

et al. 2008

Protecting sensitive information while preserving the shareability and usability of data is becoming increasingly important. In call-centers a lot of customer related sensitive information is stored in audio recordings. In this work, we address the problem of protecting sensitive information in audio recordings and speech transcripts. We present a semi-supervised method to model sensitive information as a directed graph. Effectiveness of this approach is demonstrated by applying it to the problem of detecting and locating credit card transaction in real life conversations between agents and customers in a call center.