Lung cancer generally occurs in both male and female due to uncontrollable growth of cells in the lungs. This causes a serious breathing problem in both inhale and exhale part of chest. Cigarette smoking and passive smoking are the principal contributor for the cause of lung cancer as per world health organization. The mortality rate due to lung cancer is increasing day by day in youths as well as in old persons as compared to other cancers. Even though the availability of high tech Medical facility for careful diagnosis and effective medical treatment, the mortality rate is not yet controlled up to a good extent. Therefore it is highly necessary to take early precautions at the initial stage such that it’s symptoms and effect can be found at early stage for better diagnosis. Machine learning now days has a great influence to health care sector because of its high computational capability for early prediction of the diseases with accurate data analysis. In our paper we have analyzed various machine learning classifiers techniques to classify available lung cancer data in UCI machine learning repository in to benign and malignant. The input data is prepossessed and converted in to binary form followed by use of some well known classifier technique in Weka tool to classify the data set in to cancerous and non cancerous. The comparison technique reveals that the proposed RBF classifier has resulted with a great accuracy of 81.25% and considered as the effective classifier technique for Lung cancer data prediction.
The newly proposed weighted k nearest neighbour is known as standard deviation K nearest neighbour(SDKNN) classifier technique. It is based on the principle of standard deviation. Standard deviation measures spreading of attribute about mean. Spreading of attribute plays a significant role to improve the classification accuracy of a dataset. Most of our distance calculation method between two points is determined by using euclidean distance process for finding nearest neighbour. Our proposed technique is based on a new distance calculation formula to find nearest neighbour in KNN. We apply here standard deviations of attributes as power for calculating distance between train dataset and test dataset. Distance calculation between two points in k nearest neighbour classifier is modified according to the standard deviation of attribute. In this paper, standard deviation of attributes are used. In first attempt, we have used standard deviation of attributes as power for calculating K Nearest Neighbour to improve classification accuracy and in second attempt, based on mean of standard deviation attributes, distance in K Nearest Neighbour is processed to further improve the classification accuracy. Our concept is implemented on Pima Indian Diabetes Dataset (PIDD). The analysis on Pima Indian Diabetes Dataset (PIDD) is carried out by splitting dataset in to 90% training data and 10% testing data. We have found that, in our proposed technique, average classification accuracy gives result 83.2%, a great improvement as compared to other conventional technique.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.