Outlier detection is a fast-moving method in healthcare data and it is the major concern for the health insurance providers. Most of the Medicare data is related to real-world data. Outlier analysis plays a crucial role in data validity and reliability. To detect outlier for medical data is a complex task as it is having more number of variables and is of multivariate in nature. The paper presents a model-based approach in which outliers are detected and they were assigned with labels. The outlier or suspicious is defined as some outcome, which is expected that it is going to commit fraud. The methodology carried out in two phases to develop a Supervised Outlier Detection Approach in healthcare Claims (SODAC). Initially, the data preprocessing stage for feature selection it uses the filter method and set grouping hierarchy to select the best subset and to organize the features. The outlier detection phase uses the combination of classic methods of statistical and distance-based approach. To evaluate the distribution of data the Gaussian probability density function is applied for the data values. The distance-based approach which reflects the outputs as outlier codes. The partitioning of the input dataset and applies statistical mean to each subset and further uses derived multi aggregate metric to consolidate the data instances of the partitions(subsets). The distance-based outlier detection (dod) is done by calculating the maximum distance from the inner average statistical mean measure of the neighborhood from the data objects (instances) of the input. The data object value goes beyond the maximum or minimum of calculated measure is termed as suspicious. This type of detection of outliers is called as indicative fraud potential. The results performed relatively stable for a large scale data as illustrated in the experimentation part over publicly available real world data.
Detecting fraudulent and abusive cases in healthcare is one of the most challenging problems for data mining studies. Existing studies have a lack of real data for analysis and focus on a very partial version of the problem by covering only a specific actor, healthcare service, or disease. In this article, the proposed strategy identifies fraudulent behaviors in Medicare claims data using several predictors as model inputs. The methodology involves preprocessing and model development phases. At the initial phase, the feature mining is done by estimating their feature importance score. The labeling of instances by using the classification rules to the whole dataset. Thus, a transformed dataset is obtained by the model. In the development phase, the RF with SMOTE is applied against the training and testing data. Specifically, SMOTE adapted to balance data and sorts misclassified instances and finds the interesting instances. The results of the proposed model improvises the classifier performance RF with SMOTE when contrast with RF method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.