Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent codebased schemes in text generation. Adversarial latent code-based text generation has recently gained a lot of attention due to its promising results. In this paper, we take a step to fortify the architectures used in these setups, specifically AAE and ARAE. We benchmark two latent code-based methods (AAE and ARAE) designed based on adversarial setups. In our experiments, the Google sentence compression dataset is utilized to compare our method with these methods using various objective and subjective measures. The experiments demonstrate the proposed (self) attention-based models outperform the state-of-the-art in adversarial code-based text generation. *
Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization. Generative adversarial networks (GANs) achieved a remarkable success in high quality image generation in computer vision, and recently, GANs have gained lots of interest from the NLP community as well. However, achieving similar success in NLP would be more challenging due to the discrete nature of text. In this work, we introduce a method using knowledge distillation to effectively exploit GAN setup for text generation. We demonstrate how autoencoders (AEs) can be used for providing a continuous representation of sentences, which is a smooth representation that assign non-zero probabilities to more than one word. We distill this representation to train the generator to synthesize similar smooth representations. We perform a number of experiments to validate our idea using different datasets and show that our proposed approach yields better performance in terms of the BLEU score and Jensen-Shannon distance (JSD) measure compared to traditional GAN-based text generation approaches without pre-training.
We introduce novel language model (LM) adaptation approaches using the latent Dirichlet allocation (LDA) model. Observed n-grams in the training set are assigned to topics using soft and hard clustering. In soft clustering, each n-gram is assigned to topics such that the total count of that n-gram for all topics is equal to the global count of that n-gram in the training set. Here, the normalized topic weights of the n-gram are multiplied by the global n-gram count to form the topic n-gram count for the respective topics. In hard clustering, each n-gram is assigned to a single topic with the maximum fraction of the global n-gram count for the corresponding topic. Here, the topic is selected using the maximum topic weight for the n-gram. The topic n-gram count LMs are created using the respective topic n-gram counts and adapted by using the topic weights of a development test set. We compute the average of the confidence measures: the probability of word given topic and the probability of topic given word. The average is taken over the words in the n-grams and the development test set to form the topic weights of the n-grams and the development test set respectively. Our approaches show better performance over some traditional approaches using the WSJ corpus.
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hardclustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hardclustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.