Given a data set D containing millions of data points and a data consumer who is willing to pay for $X to train a machine learning (ML) model over D, how should we distribute this $X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N ) model evaluations for exact computation and O(N log N) for ( , δ)-approximation.In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N log N) time -an exponential improvement on computational complexity! Moreover, for ( , δ)-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity O(N h( ,K) log N) when is not too small and K is not too large. We empirically evaluate our algorithms on up to 10 million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further.We then extend our algorithms to other scenarios such as (1) weighed KNN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation. Some of these extensions, although also being improved exponentially, are less practical for exact computation (e.g., O(N K ) complexity for weighted KNN). We thus propose a Monte Carlo approximation algorithm, which is O(N(log N) 2 /(log K) 2 ) times more efficient than the baseline approximation algorithm.
Perovskite solar cells (PSCs) employing 3D organic-inorganic hybrid perovskite photoabsorbers have received tremendous progress with state-of-the-art power conversion efficiency (PCE) exceeding 25% during the last a dozen years. [1] However, ambient instability of 3D perovskite materials remains a critical obstacle for realistic applications of PSCs. [2,3] A strategy in addressing the poor stability is to reduce the structural dimensionality of perovskites via the introduction of long-chain organic ligands by forming Ruddlesden-Popper (RP) quasi-2D perovskites. [4,5] The organic ligands are bound to the 3D inorganic framework via coulombic interactions, resulting in a layered structure. The general formula of RP-2D perovskites takes the form of (L) 2 A n−1 Pb n I 3n+1 (n = 1, 2, 3, 4…) where A is the methylammonium (MA +), formamidinium (FA +), or cesium (Cs +) cations, L is the bulky organic ligands, e.g., butylammonium (BA +) or 2-phenylethylammonium (PEA +), and n is the number of layers in the [PbI 6 ] 4− octahedral sheets. [4-7] The incorporation of hydrophobic bulky organic ligands can not only enhance the stability of perovskites with minimized permeation of water molecules but also increase the formation energy of perovskites to mitigate thermal degradation and ion migration. [8-10] These merits alongside the quantum confinement have rendered quasi-2D perovskites great potentials for optoelectronic applications with a wide tunability on the bandgap or photophysical properties. [7] Unfavorably, quasi-2D perovskites are generally associated with a large exciton binding energy (hundreds of meV) due to the insulating nature of bulky organic ligands and the specific layered arrangement. [11,12] As a result, charge transport and extraction are hindered in quasi-2D PSCs. To date, the highest reported PCEs of quasi-2D PSCs (n ≤ 5) remain around 18%, [13-15] showing considerable performance gaps with regard to 3D-PSCs. The PCE (η) of photovoltaic cells is determined by the general relation, J V P FF sc oc light η = × × (V oc is the open-circuit voltage, FF is the fill factor, and P light is the illumination intensity). In quasi-2D PSCs, the relatively low J sc is more restrictive for Organic-inorganic hybrid quasi-2D perovskites have shown excellent stability for perovskite solar cells (PSCs), while the poor charge transport in quasi-2D perovskites significantly undermines their power conversion efficiency (PCE). Here, studies on water-controlled crystal growth of quasi-2D perovskites are presented to achieve high-efficiency solar cells. It is demonstrated that the (BA) 2 MA 4 Pb 5 I 16-based PSCs (n = 5) processed with water-containing precursors display an increased short-circuit current density (J sc) of 19.01 mA cm −2 and PCE over 15%. The enhanced performance is attributed to synergetic growths of the 3D and 2D phase components aided by the formed hydration (MAI•H 2 O), leading to modulations on the crystal orientation and phase distribution of various n-value components, which facilitate interphase charge tr...
Adversarial attacks against natural language processing systems, which perform seemingly innocuous modifications to inputs, can induce arbitrary mistakes to the target models. Though raised great concerns, such adversarial attacks can be leveraged to estimate the robustness of NLP models. Compared with the adversarial example generation in continuous data domain (e.g., image), generating adversarial text that preserves the original meaning is challenging since the text space is discrete and non-differentiable. To handle these challenges, we propose a target-controllable adversarial attack framework T3, which is applicable to a range of NLP tasks. In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation. A novel tree-based decoder is then applied to regularize the syntactic correctness of the generated text and manipulate it on either sentence (T3(SENT)) or word (T3(WORD)) level. We consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that T3 generated adversarial texts can successfully manipulate the NLP models to output the targeted incorrect answer without misleading the human. Moreover, we show that the generated adversarial texts have high transferability which enables the black-box attacks in practice. Our work sheds light on an effective and general way to examine the robustness of NLP models. Our code is publicly available at https://github.com/AI-secure/T3/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.