Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers-which is addressed with the development of both classical and non-metric based approaches. However, classical techniques are sensitive to outliers and most of the non-classical approaches are either problem/application specific or overlay complex. Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community. In this paper, a non-metric based similarity measure algorithm, for MDSs, is presented that solves the aforementioned issues, particularly, noise and computational time, successfully. This technique finds the similarity indexes of noisy MDSs, of both equal and variable sizes, through utilizing minimum possible resources i.e., space and time. Experiments were conducted with both benchmark and real time MDSs for evaluating the proposed algorithm's performance against its rival algorithms, which are traditional dynamic programming based and sequential similarity measure algorithms. Experimental results show that the proposed scheme performs exceptionally well, in terms of time and space, than its counterpart algorithms and effectively tolerates a considerable portion of noisy data. INDEX TERMS Similarity index, multivariate data set, outliers, the longest common subsequence. I. INTRODUCTION Recent technological advancements, particularly in sensors and actuators, lead to the generation of enormous multivariate data sets (MDSs) in different application areas i.e., wireless sensor networks, internet of things (IoT), scientific experiments, industrial control processes, educational purpose testbeds, web and databases [1]. An MDS is defined as a set of related numbers or values associated with a specific entity in an organization. In other words, a group of univariate data sets in columns form is known as MDS [2]. Mathematically, it is represented as a matrix X m , n , where m and n corresponds to the rows and columns respectively. These MDSs are thor-The associate editor coordinating the review of this manuscript and approving it for publication was Chongsheng Zhang. oughly examined, using various classical and non-classical approaches, to discover valuable information that is used to determine the correlating or distinguishing factor of entities. One of the major issue, closely linked with MDS, is to find their similarity indexes in the presence of noise/outliers that is not possible with existing techniques. Generally, two MDSs, X i , j and Y m , n , are believed similar if most of their elements are highly correlated [3]. MDSs similarity problem is an active research area, both in computer science and mathematics, that is due to its existence in different real world application environments i.e., DNA analysis, sensors-based real...