Inverse Reinforcement Learning (IRL) is an approach for domain-reward discovery from demonstration, where an agent mines the reward function of a Markov decision process by observing an expert acting in the domain. In the standard setting, it is assumed that the expert acts (nearly) optimally, and a large number of trajectories, i.e., training examples are available for reward discovery (and consequently, learning domain behavior). These are not practical assumptions: trajectories are often noisy, and there can be a paucity of examples. Our novel approach incorporates advice-giving into the IRL framework to address these issues. Inspired by preference elicitation, a domain expert provides advice on states and actions (features) by stating preferences over them. We evaluate our approach on several domains and show that with small amounts of targeted preference advice, learning is possible from noisy demonstrations, and requires far fewer trajectories compared to simply learning from trajectories alone.
Advice-giving has been long explored in the artificial intelligence community to build robust learning algorithms when the data is noisy, incorrect or even insufficient. While logic based systems were effectively used in building expert systems, the role of the human has been restricted to being a "mere labeler" in recent times. We hypothesize and demonstrate that probabilistic logic can provide an effective and natural way for the expert to specify domain advice. Specifically, we consider different types of advice-giving in relational domains where noise could arise due to systematic errors or class-imbalance inherent in the domains. The advice is provided as logical statements or privileged features that are thenexplicitly considered by an iterative learning algorithm at every update. Our empirical evidence shows that human advice can effectively accelerate learning in noisy, structured domains where so far humans have been merely used as labelers or as designers of the (initial or final) structure of the model.
Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society in general. When methods extract ADEs from observational data, there is a necessity to evaluate these methods. More precisely, it is important to know what is already known in the literature. Consequently, we employ a novel relation extraction technique based on a recently developed probabilistic logic learning algorithm that exploits human advice. We demonstrate on a standard adverse drug events data base that the proposed approach can successfully extract existing adverse drug events from limited amount of training data and compares favorably with state-of-the-art probabilistic logic learning methods.
Incorporating richer human inputs including qualitative constraints such as monotonic and synergistic influences has long been adapted inside AI. Inspired by this, we consider the problem of using such influence statements in the successful gradient-boosting framework. We develop a unified framework for both classification and regression settings that can both effectively and efficiently incorporate such constraints to accelerate learning to a better model. Our results in a large number of standard domains and two particularly novel real-world domains demonstrate the superiority of using domain knowledge rather than treating the human as a mere labeler.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.