Avi Caciularu scite author profile

A new maximum likelihood estimation approach for blind channel equalization, using variational autoencoders (VAEs), is introduced. Significant and consistent improvements in the error rate of the reconstructed symbols, compared to constant modulus equalizers, are demonstrated. In fact, for the channels that were examined, the performance of the new VAE blind channel equalizer was close to the performance of a nonblind adaptive linear minimum mean square error equalizer. The new equalization method enables a significantly lower latency channel acquisition compared to the constant modulus algorithm (CMA). The VAE uses a convolutional neural network with two layers and a very small number of free parameters. Although the computational complexity of the new equalizer is higher compared to CMA, it is still reasonable, and the number of free parameters to estimate is small.

show abstract

Paraphrasing vs Coreferring: Two Sides of the Same Coin

Meged¹,

Caciularu²,

Shwartz³

et al. 2020

View full text Add to dashboard Cite

We study the potential synergy between two different NLP tasks, both confronting predicate lexical variability: identifying predicate paraphrases, and event coreference resolution. First, we used annotations from an event coreference dataset as distant supervision to re-score heuristically-extracted predicate paraphrases. The new scoring gained more than 18 points in average precision upon their ranking by the original scoring method. Then, we used the same re-ranking features as additional inputs to a state-of-the-art event coreference resolution model, which yielded modest but consistent improvements to the model's performance. The results suggest a promising direction to leverage data and models for each of the tasks to the benefit of the other.

show abstract

Unsupervised Linear and Nonlinear Channel Equalization and Decoding Using Variational Autoencoders

Caciularu

Burshtein

2020

IEEE Trans. Cogn. Commun. Netw.

View full text Add to dashboard Cite

Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding

Barkan

Razin

Malkiel

et al. 2020

AAAI

View full text Add to dashboard Cite

Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations – a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) – a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/Distilled-Sentence-Embedding.

show abstract

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Geva¹,

Caciularu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by reverseengineering the operation of the feed-forward network (FFN) layers, one of the building blocks of transformer models. We view the token representation as a changing distribution over the vocabulary, and the output from each FFN layer as an additive update to that distribution. Then, we analyze the FFN updates in the vocabulary space, showing that each update can be decomposed to sub-updates corresponding to single FFN parameter vectors, each promoting concepts that are often humaninterpretable. We then leverage these findings for controlling LM predictions, where we reduce the toxicity of GPT2 by almost 50%, and for improving computation efficiency with a simple early exit rule, saving 20% of computation on average. 1 * Equal contribution. † Work done during an internship at AI2. 1 https://github.com/aviclu/ffn-values.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Avi Caciularu

Blind Channel Equalization Using Variational Autoencoders

Paraphrasing vs Coreferring: Two Sides of the Same Coin

Unsupervised Linear and Nonlinear Channel Equalization and Decoding Using Variational Autoencoders

Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Contact Info

Product

Resources

About