Boxin Wang scite author profile

Given a data set D containing millions of data points and a data consumer who is willing to pay for $X to train a machine learning (ML) model over D, how should we distribute this $X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N ) model evaluations for exact computation and O(N log N) for ( , δ)-approximation.In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N log N) time -an exponential improvement on computational complexity! Moreover, for ( , δ)-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity O(N h( ,K) log N) when is not too small and K is not too large. We empirically evaluate our algorithms on up to 10 million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further.We then extend our algorithms to other scenarios such as (1) weighed KNN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation. Some of these extensions, although also being improved exponentially, are less practical for exact computation (e.g., O(N K ) complexity for weighted KNN). We thus propose a Monte Carlo approximation algorithm, which is O(N(log N) 2 /(log K) 2 ) times more efficient than the baseline approximation algorithm.

show abstract

Water‐Assisted Crystal Growth in Quasi‐2D Perovskites with Enhanced Charge Transport and Photovoltaic Performance

Wang

et al. 2020

Advanced Energy Materials

View full text Add to dashboard Cite

Perovskite solar cells (PSCs) employing 3D organic-inorganic hybrid perovskite photoabsorbers have received tremendous progress with state-of-the-art power conversion efficiency (PCE) exceeding 25% during the last a dozen years. [1] However, ambient instability of 3D perovskite materials remains a critical obstacle for realistic applications of PSCs. [2,3] A strategy in addressing the poor stability is to reduce the structural dimensionality of perovskites via the introduction of long-chain organic ligands by forming Ruddlesden-Popper (RP) quasi-2D perovskites. [4,5] The organic ligands are bound to the 3D inorganic framework via coulombic interactions, resulting in a layered structure. The general formula of RP-2D perovskites takes the form of (L) 2 A n−1 Pb n I 3n+1 (n = 1, 2, 3, 4…) where A is the methylammonium (MA +), formamidinium (FA +), or cesium (Cs +) cations, L is the bulky organic ligands, e.g., butylammonium (BA +) or 2-phenylethylammonium (PEA +), and n is the number of layers in the [PbI 6 ] 4− octahedral sheets. [4-7] The incorporation of hydrophobic bulky organic ligands can not only enhance the stability of perovskites with minimized permeation of water molecules but also increase the formation energy of perovskites to mitigate thermal degradation and ion migration. [8-10] These merits alongside the quantum confinement have rendered quasi-2D perovskites great potentials for optoelectronic applications with a wide tunability on the bandgap or photophysical properties. [7] Unfavorably, quasi-2D perovskites are generally associated with a large exciton binding energy (hundreds of meV) due to the insulating nature of bulky organic ligands and the specific layered arrangement. [11,12] As a result, charge transport and extraction are hindered in quasi-2D PSCs. To date, the highest reported PCEs of quasi-2D PSCs (n ≤ 5) remain around 18%, [13-15] showing considerable performance gaps with regard to 3D-PSCs. The PCE (η) of photovoltaic cells is determined by the general relation, J V P FF sc oc light η = × × (V oc is the open-circuit voltage, FF is the fill factor, and P light is the illumination intensity). In quasi-2D PSCs, the relatively low J sc is more restrictive for Organic-inorganic hybrid quasi-2D perovskites have shown excellent stability for perovskite solar cells (PSCs), while the poor charge transport in quasi-2D perovskites significantly undermines their power conversion efficiency (PCE). Here, studies on water-controlled crystal growth of quasi-2D perovskites are presented to achieve high-efficiency solar cells. It is demonstrated that the (BA) 2 MA 4 Pb 5 I 16-based PSCs (n = 5) processed with water-containing precursors display an increased short-circuit current density (J sc) of 19.01 mA cm −2 and PCE over 15%. The enhanced performance is attributed to synergetic growths of the 3D and 2D phase components aided by the formed hydration (MAI•H 2 O), leading to modulations on the crystal orientation and phase distribution of various n-value components, which facilitate interphase charge tr...

show abstract

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Wang¹,

Pei²,

Pan³

et al. 2020

View full text Add to dashboard Cite

Adversarial attacks against natural language processing systems, which perform seemingly innocuous modifications to inputs, can induce arbitrary mistakes to the target models. Though raised great concerns, such adversarial attacks can be leveraged to estimate the robustness of NLP models. Compared with the adversarial example generation in continuous data domain (e.g., image), generating adversarial text that preserves the original meaning is challenging since the text space is discrete and non-differentiable. To handle these challenges, we propose a target-controllable adversarial attack framework T3, which is applicable to a range of NLP tasks. In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation. A novel tree-based decoder is then applied to regularize the syntactic correctness of the generated text and manipulate it on either sentence (T3(SENT)) or word (T3(WORD)) level. We consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that T3 generated adversarial texts can successfully manipulate the NLP models to output the targeted incorrect answer without misleading the human. Moreover, we show that the generated adversarial texts have high transferability which enables the black-box attacks in practice. Our work sheds light on an effective and general way to examine the robustness of NLP models. Our code is publicly available at https://github.com/AI-secure/T3/.

show abstract

Nanoscale heterogeneous distribution of surface energy at interlayers in organic bulk-heterojunction solar cells

Ding

Cheng

et al. 2021

Joule

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Boxin Wang

Efficient task-specific data valuation for nearest neighbor algorithms

Water‐Assisted Crystal Growth in Quasi‐2D Perovskites with Enhanced Charge Transport and Photovoltaic Performance

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Nanoscale heterogeneous distribution of surface energy at interlayers in organic bulk-heterojunction solar cells

Contact Info

Product

Resources

About