Symmetric positive definite (SPD) data have become a hot topic in machine learning. Instead of a linear Euclidean space, SPD data generally lie on a nonlinear Riemannian manifold. To get over the problems caused by the high data dimensionality, dimensionality reduction (DR) is a key subject for SPD data, where bilinear transformation plays a vital role. Because linear operations are not supported in nonlinear spaces such as Riemannian manifolds, directly performing Euclidean DR methods on SPD matrices is inadequate and difficult in complex models and optimization. An SPD data DR method based on Riemannian manifold tangent spaces and global isometry (RMTSISOM-SPDDR) is proposed in this research. The main contributions are listed: (1) Any Riemannian manifold tangent space is a Hilbert space isomorphic to a Euclidean space. Particularly for SPD manifolds, tangent spaces consist of symmetric matrices, which can greatly preserve the form and attributes of original SPD data. For this reason, RMTSISOM-SPDDR transfers the bilinear transformation from manifolds to tangent spaces. (2) By log transformation, original SPD data are mapped to the tangent space at the identity matrix under the affine invariant Riemannian metric (AIRM). In this way, the geodesic distance between original data and the identity matrix is equal to the Euclidean distance between corresponding tangent vector and the origin. (3) The bilinear transformation is further determined by the isometric criterion guaranteeing the geodesic distance on high-dimensional SPD manifold as close as possible to the Euclidean distance in the tangent space of low-dimensional SPD manifold. Then, we use it for the DR of original SPD data. Experiments on five commonly used datasets show that RMTSISOM-SPDDR is superior to five advanced SPD data DR algorithms.
Tensor data are becoming more and more common in machine learning. Compared with vector data, the curse of dimensionality of tensor data is more serious. The motivation of this paper is to combine Hilbert-Schmidt Independence Criterion (HSIC) and tensor algebra to create a new dimensionality reduction algorithm for tensor data. There are three contributions in this paper. (1) An HSIC-based algorithm is proposed in which the dimension-reduced tensor is determined by maximizing HSIC between the dimension-reduced and high-dimensional tensors. (2) A tensor algebra-based algorithm is proposed, in which the high-dimensional tensor are projected onto a subspace and the projection coordinate is set to be the dimension-reduced tensor. The subspace is determined by minimizing the distance between the high-dimensional tensor data and their projection in the subspace. (3) By combining the above two algorithms, a new dimensionality reduction algorithm, called PDMHSIC, is proposed, in which the dimensionality reduction must satisfy two criteria at the same time: HSIC maximization and subspace projection distance minimization. The proposed algorithm is a new attempt to combine HSIC with other algorithms to create new algorithms and has achieved better experimental results on 8 commonly-used datasets than the other 7 well-known algorithms.
This paper describes our system, which placed third in the Multilingual Track (subtask 11), fourth in the Code-Mixed Track (subtask 12), and seventh in the Chinese Track (subtask 9) in the SemEval 2022 Task 11: MultiCoNER Multilingual Complex Named Entity Recognition. Our system's key contributions are as follows: 1) For multilingual NER tasks, we offer an unified framework with which one can easily execute single-language or multilingual NER tasks, 2) for low-resource code-mixed NER task, one can easily enhance his or her dataset through implementing several simple data augmentation methods and 3) for Chinese tasks, we propose a model that can capture Chinese lexical semantic, lexical border, and lexical graph structural information. Finally, our system achieves macro-f1 scores of 77.66, 84.35, and 74.00 on subtasks 11, 12, and 9, respectively, during the testing phase.
This paper describes our system, which placed third in the Multilingual Track (subtask 11), fourth in the Code-Mixed Track (subtask 12), and seventh in the Chinese Track (subtask 9) in the SemEval 2022 Task 11: MultiCoNER Multilingual Complex Named Entity Recognition. Our system's key contributions are as follows: 1) For multilingual NER tasks, we offer an unified framework with which one can easily execute single-language or multilingual NER tasks, 2) for low-resource code-mixed NER task, one can easily enhance his or her dataset through implementing several simple data augmentation methods and 3) for Chinese tasks, we propose a model that can capture Chinese lexical semantic, lexical border, and lexical graph structural information. Finally, our system achieves macro-f1 scores of 77.66, 84.35, and 74.00 on subtasks 11, 12, and 9, respectively, during the testing phase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.