Many software startups and research and development efforts are actively taking place to harness the power of big data and create software with potential to improve almost every aspect of human life. As these efforts continue to increase, full consideration needs to be given to engineering aspects of big data software. Since these systems exist to make predictions on complex and continuous massive datasets, they pose unique problems during specification, design, and verification of software that needs to be delivered on-time and within budget. But, given the nature of big data software, can this be done? Does big data software engineering really work? This article explores details of big data software, discusses the main problems encountered when engineering big data software, and proposes avenues for future research.
We show that the Fisher-Rao Riemannian metric is a natural, intrinsic tool for computing shape geodesics. When a parameterized probability density function is used to represent a landmark-based shape, the modes of deformation are automatically established through the Fisher information of the density. Consequently, given two shapes parameterized by the same density model, the geodesic distance between them under the action of the Fisher-Rao metric is a convenient shape distance measure. It has the advantage of being an intrinsic distance measure and invariant to reparameterization. We first model shape landmarks using a Gaussian mixture model and then compute geodesic distances between two shapes using the FisherRao metric corresponding to the mixture model. We illustrate our approach by computing Fisher geodesics between 2D corpus callosum shapes. Shape representation via the mixture model and shape deformation via the Fisher geodesic are hereby unified in this approach.
This paper proposes a new affine registration algorithm for matching two point sets in IR 2 or IR 3 . The input point sets are represented as probability density functions, using either Gaussian mixture models or discrete density models, and the problem of registering the point sets is treated as aligning the two distributions. Since polynomials transform as symmetric tensors under an affine transformation, the distributions' moments, which are the expected values of polynomials, also transform accordingly. Therefore, instead of solving the harder problem of aligning the two distributions directly, we solve the softer problem of matching the distributions' moments. By formulating a least-squares problem for matching moments of the two distributions up to degree three, the resulting cost function is a polynomial that can be efficiently optimized using techniques originated from algebraic geometry: the global minimum of this polynomial can be determined by solving a system of polynomial equations. The algorithm is robust in the presence of noises and outliers, and we validate the proposed algorithm on a variety of point sets with varying degrees of deformation and noise.
Density estimation for observational data plays an integral role in a broad spectrum of applications, e.g., statistical data analysis and information-theoretic image registration. Of late, wavelet-based density estimators have gained in popularity due to their ability to approximate a large class of functions, adapting well to difficult situations such as when densities exhibit abrupt changes. The decision to work with wavelet density estimators brings along with it theoretical considerations (e.g., non-negativity, integrability) and empirical issues (e.g., computation of basis coefficients) that must be addressed in order to obtain a bona fide density. In this paper, we present a new method to accurately estimate a non-negative density which directly addresses many of the problems in practical wavelet density estimation. We cast the estimation procedure in a maximum likelihood framework which estimates the square root of the density radicalp, allowing us to obtain the natural non-negative density representation ( radicalp)(2). Analysis of this method will bring to light a remarkable theoretical connection with the Fisher information of the density and, consequently, lead to an efficient constrained optimization procedure to estimate the wavelet coefficients. We illustrate the effectiveness of the algorithm by evaluating its performance on mutual information-based image registration, shape point set alignment, and empirical comparisons to known densities. The present method is also compared to fixed and variable bandwidth kernel density estimators.
Shape matching plays a prominent role in the comparison of similar structures. We present a unifying framework for shape matching that uses mixture models to couple both the shape representation and deformation. The theoretical foundation is drawn from information geometry wherein information matrices are used to establish intrinsic distances between parametric densities. When a parameterized probability density function is used to represent a landmarkbased shape, the modes of deformation are automatically established through the information matrix of the density. We first show that given two shapes parameterized by Gaussian mixture models (GMMs), the well-known Fisher information matrix of the mixture model is also a Riemannian metric (actually, the Fisher-Rao Riemannian metric) and can therefore be used for computing shape geodesics. The Fisher-Rao metric has the advantage of being an intrinsic metric and invariant to reparameterization. The geodesic-computed using this metric-establishes an intrinsic deformation between the shapes, thus unifying both shape representation and deformation. A fundamental drawback of the Fisher-Rao metric is that it is not available in closed form for the GMM. Consequently, shape comparisons are computationally very expensive. To address this, we develop a new Riemannian metric based on generalized ϕ-entropy measures. In sharp contrast to the Fisher-Rao metric, the new metric is available in closed form. Geodesic computations using the new metric are considerably more efficient. We validate the performance and discriminative capabilities of these new information geometry-based metrics by pairwise matching of corpus callosum shapes. We also study the deformations of fish shapes that have various topological properties. A comprehensive comparative analysis is also provided using other landmark-based distances, including the Hausdorff distance, the Procrustes metric, landmarkbased diffeomorphisms, and the bending energies of the thin-plate (TPS) and Wendland splines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.