Zujie Wen scite author profile

Zujie Wen

4Publications

16Citation Statements Received

102Citation Statements Given

How they've been cited

How they cite others

115

102

Affiliations

Antea Group (France)

Publications

Order By: Most citations

R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

Hu¹,

Mi²,

Wen³

et al. 2021

View full text Add to dashboard Cite

Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach. 1

show abstract

Interactive Question Clarification in Dialogue via Reinforcement Learning

Hu¹,

Wen²,

Wang

et al. 2020

View full text Add to dashboard Cite

Coping with ambiguous questions has been a perennial problem in real-world dialogue systems. Although clarification by asking questions is a common form of human interaction, it is hard to define appropriate questions to elicit more specific intents from a user. In this work, we propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query. We first formulate a collection partitioning problem to select a set of labels enabling us to distinguish potential unambiguous intents. We list the chosen labels as intent phrases to the user for further confirmation. The selected label along with the original user query then serves as a refined query, for which a suitable response can more easily be identified. The model is trained using reinforcement learning with a deep policy network. We evaluate our model based on real-world user clicks and demonstrate significant improvements across several different experiments.

show abstract

SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising

Kuan¹,

Wang²,

Wang³

et al. 2022

View full text Add to dashboard Cite

On the WikiSQL 1 benchmark, most methods tackle the challenge of text-to-SQL with predefined sketch slots and build sophisticated sub-tasks to fill these slots. Though achieving promising results, these methods suffer from over-complex model structure. In this paper, we present a simple yet effective approach that enables auto-regressive sequenceto-sequence model to robust text-to-SQL generation. Instead of formulating the task of text-to-SQL as slot-filling, we propose to train sequence-to-sequence model with Schemaaware Denoising (SeaD), which consists of two denoising objectives that train model to either recover input or predict output from two novel erosion and shuffle noises. These modelagnostic denoising objectives act as the auxiliary tasks for structural data modeling during sequence-to-sequence generation. In addition, we propose a clause-sensitive execution guided (EG) decoding strategy to overcome the limitation of EG decoding for generative model. The experiments show that the proposed method improves the performance of sequence-to-sequence model in both schema linking and grammar correctness and establishes new state-of-the-art on WikiSQL benchmark. Our work indicates that the capacity of sequence-to-sequence model for text-to-SQL may have been under-estimated and could be enhanced by specialized denoising task.

show abstract

Query Distillation: BERT-based Distillation for Ensemble Ranking

Zhang¹,

Liu²,

Wen³

et al. 2020

View full text Add to dashboard Cite

Recent years have witnessed substantial progress in the development of neural ranking networks, but also an increasingly heavy computational burden due to growing numbers of parameters and the adoption of model ensembles. Knowledge Distillation (KD) is a common solution to balance the effectiveness and efficiency. However, it is not straightforward to apply KD to ranking problems. Ranking Distillation (RD) has been proposed to address this issue, but only shows effectiveness on recommendation tasks. We present a novel two-stage distillation method for ranking problems that allows a smaller student model to be trained while benefitting from the better performance of the teacher model, providing better control of the inference latency and computational burden. We design a novel BERT-based ranking model structure for list-wise ranking to serve as our student model. All ranking candidates are fed to the BERT model simultaneously, such that the self-attention mechanism can enable joint inference to rank the document list. Our experiments confirm the advantages of our method, not just with regard to the inference latency but also in terms of higher-quality rankings compared to the original teacher model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zujie Wen

R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

Interactive Question Clarification in Dialogue via Reinforcement Learning

SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising

Query Distillation: BERT-based Distillation for Ensemble Ranking

Contact Info

Product

Resources

About