The huge amount of data being generated by different organizations and its underlying advantages in multiple fields like decision making, data security, research purposes have made data classification a very important and mandatory process now-a-days. Data Classification is the process of grouping data of similar characteristics into categories. Classification can be done based on the output we are looking forward to. Hence it is considered very useful. Classifying data allows us to predict the nature of future data-sets and discover useful patterns among them. This project aims at classifying gene data sets. Gene data sets are the information collected from a set of genes put to a specific test. It can be used for medical research purposes; by studying the pattern in the datasets allows us to predict the kind of genes that are more vulnerable to a particular disease there by allowing us to prevent the manifestation of the disease right at its beginning, just as they say, prevention is better than cure. In this paper, such classification is effort using a supervised machine learning algorithm -SVM (Support Vector Machine). There are many algorithms in existence to perform classification but this algorithm has its own lead over the others. It is capable of both classification and regression. It works well with structured, semistructured and unstructured data too. It contains a kernel function which when used appropriately can solve any complex problem. The summary of this project is, taking gene data sets as input and obtaining classified clusters as output.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.