A Gaussian Prior for Smoothing Maximum Entropy Models

Chen, Stanley F.; Rosenfeld, Ronald

doi:10.21236/ada360974

Cited by 229 publications

(143 citation statements)

References 21 publications

Supporting

Mentioning

141

Contrasting

Order By: Relevance

“…Then, the benefit of using our framework will be even greater than what is suggested in Section 5.2. The principle of maximum entropy [14] has been successfully applied in different domains, including linguistics [13] [9] and databases [16]. Faloutsos et al apply maximum entropy, in addition to other techniques, for one-dimensional data reconstruction [16].…”

Section: Discussionmentioning

confidence: 99%

Using datacube aggregates for approximate querying and deviation detection

Palpanas

Koudas

Mendelzon

2005

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Abstract-Much research has been devoted to the efficient computation of relational aggregations and, specifically, the efficient execution of the datacube operation. In this paper, we consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We then show how approximate queries on the data from which the aggregates were derived can be performed using our framework. We also describe an alternate use of the proposed framework that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. We present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions. Finally, we evaluate our techniques with a case study on a real data set, which illustrates the applicability of our approach.

show abstract

Section: Discussionmentioning

confidence: 99%

Using datacube aggregates for approximate querying and deviation detection

Palpanas

Koudas

Mendelzon

2005

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

show abstract

“…To perform the convolution on discrete data, a convolution matrix is generated, typically with dimensions 6σ x × 6σ y , eventually leading to a compactly supported kernel. Gaussian filters have been used in language modelling to address data sparseness (Chen and Rosenfeld, 1999). We introduce this compactly supported two-dimensional Gaussian filter over the two-way seriated sparse data.…”

Section: Mapping Evolving Semantic Structuresmentioning

confidence: 99%

Demonstrating conceptual dynamics in an evolving text collection

Daranyi

Wittek

2013

J Am Soc Inf Sci Tec

View full text Add to dashboard Cite

Based on real world user demands, we demonstrate how animated visualisation of evolving text corpora displays the underlying dynamics of semantic content. To interpret the results, one needs a dynamic theory of word meaning. We suggest that conceptual dynamics as the interaction between kinds of intellectual, emotional etc. content, and language, is key for such a theory. We demonstrate our methodology by two-way seriation which is a popular technique to analyse groups of similar instances and their features, as well as the connections between the groups themselves. The two-way seriated data may be visualised as a two-dimensional heat map or as a three-dimensional landscape where colour codes or height correspond to the values in the matrix. In this paper we focus on two-way seriation of sparse data in the Reuters-21568 test collection. To achieve a meaningful visualisation thereof we introduce a compactly supported convolution kernel similar to filter kernels used in image reconstruction and geostatistics. This filter populates the high-dimensional sparse space with values that interpolate nearby elements, and provides insight into the clustering structure. We also extend two-way seriation to deal with online updates of both the row and column spaces, and, combined with the convolution kernel, demonstrate a three-dimensional visualisation of dynamics.

show abstract

“…The optimal model weights are determined by a posteriori estimation using a Gaussian and a Laplace prior [10]. The resulting training criterion is also known as maximum mutual information (MMI) with 2 -and 1 -regularization.…”

Section: Combination and Optimizationmentioning

confidence: 99%

Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data

Heigold

Nguyen

Weintraub

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The acoustic models in state-of-the-art speech recognition systems are based on phones in context that are represented by hidden Markov models. This modeling approach may be limited in that it is hard to incorporate long-span acoustic context. Exemplar-based approaches are an attractive alternative, in particular if massive data and computational power are available. Yet, most of the data at Google are unsupervised and noisy. This paper investigates an exemplar-based approach under this yet not well understood data regime. A log-linear rescoring framework is used to combine the exemplar-based features on the word level with the first-pass model. This approach guarantees at least baseline performance and focuses on the refined modeling of words with sufficient data. Experimental results for the Voice Search and the YouTube tasks are presented.Index Terms-Exemplar-based speech recognition, conditional random fields, speech recognition INTRODUCTIONState-of-the-art speech recognition systems are based on hidden Markov models (HMMs) to represent phones in context. These models are convenient due to their simplicity and compactness. However, it is hard to incorporate long-span acoustic context into this type of models, without pooling observations from different examples on the frame level.Non-parametric, exemplar-based approaches such as knearest neighbors (kNN) appear to be an attractive alternative to overcome this limitation of conventional HMMs and may be more effective at capturing the large variability of speech. In this paper, we investigate an exemplar-based (also known as template-based) rescoring approach to speech recognition, which can be considered a variant of kNN on (pre-)segmented acoustic units such as words.Like for most non-parametric approaches, the main concerns about exemplar-based speech recognition are that it requires large amounts of data and thus, massive computational power. The origin of the complexity is twofold. First, there is no compact representation as in case of conventional HMMs and all data need to be memorized and processed. Second, the Dynamic Time Warping (DTW) distance [1, 2] is used to measure the similarity between two templates. Using

show abstract

A Gaussian Prior for Smoothing Maximum Entropy Models

Cited by 229 publications

References 21 publications

Using datacube aggregates for approximate querying and deviation detection

Using datacube aggregates for approximate querying and deviation detection

Demonstrating conceptual dynamics in an evolving text collection

Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data

Contact Info

Product

Resources

About