We formalize a new statistical machine learning paradigm, called infinite-label learning, to annotate a data point with more than one relevant labels from a candidate set which pools both the finite labels observed at training and a potentially infinite number of previously unseen labels. The infinite-label learning fundamentally expands the scope of conventional multi-label learning and better meets the practical requirements in various real-world applications, such as image tagging, ads-query association, and article categorization. However, how can we learn a labeling function that is capable of assigning to a data point the labels omitted from the training set? To answer the question, we seek some clues from the recent works on zero-shot learning, where the key is to represent a class/label by a vector of semantic codes, as opposed to treating them as atomic labels. We validate the infinite-label learning by a PAC bound in theory and some empirical studies on both synthetic and real data.
With the recent trend of applying machine learning in every aspect of human life, it is important to incorporate fairness into the core of the predictive algorithms. We address the problem of predicting the quality of public speeches while being fair with respect to sensitive attributes of the speakers, e.g. gender and race. We use the TED talks as an input repository of public speeches because it consists of speakers from a diverse community and has a wide outreach. Utilizing the theories of Causal Models, Counterfactual Fairness and state-of-the-art neural language models, we propose a mathematical framework for fair prediction of the public speaking quality. We employ grounded assumptions to construct a causal model capturing how different attributes affect public speaking quality. This causal model contributes in generating counterfactual data to train a fair predictive model. Our framework is general enough to utilize any assumption within the causal model. Experimental results show that while prediction accuracy is comparable to recent work on this dataset, our predictions are counterfactually fair with respect to a novel metric when compared to true data labels. The FairyTED setup not only allows organizers to make informed and diverse selection of speakers from the unobserved counterfactual possibilities but it also ensures that viewers and new users are not influenced by unfair and unbalanced ratings from arbitrary visitors to the ted.com website when deciding to view a talk.
We study the problem of counting the number of popular matchings in a given instance. A popular matching instance consists of agents A and houses H, where each agent ranks a subset of houses according to their preferences. A matching is an assignment of agents to houses. A matching M is more popular than matching M ′ if the number of agents that prefer M to M ′ is more than the number of people that prefer M ′ to M . A matching M is called popular if there exists no matching more popular than M . McDermid and Irving gave a poly-time algorithm for counting the number of popular matchings when the preference lists are strictly ordered.We first consider the case of ties in preference lists. Nasre proved that the problem of counting the number of popular matching is #P-hard when there are ties. We give an FPRAS for this problem.We then consider the popular matching problem where preference lists are strictly ordered but each house has a capacity associated with it. We give a switching graph characterization of popular matchings in this case. Such characterizations were studied earlier for the case of strictly ordered preference lists (McDermid and Irving) and for preference lists with ties (Nasre). We use our characterization to prove that counting popular matchings in capacitated case is #P-hard.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.