Current research efforts on neurological diseases are focused on identifying novel disease biomarkers to aid in diagnosis, provide accurate prognostic information and monitor disease progression. With advances in detection and quantification methods in genomics, proteomics and metabolomics, saliva has emerged as a good source of samples for detection of disease biomarkers. Obtaining a sample of saliva offers multiple advantages over the currently tested biological fluids as it is a non-invasive, painless and simple procedure that does not require expert training or harbour undesirable side effects for the patients. Here, we review the existing literature on salivary biomarkers and examine their validity in diagnosing and monitoring neurodegenerative and neuropsychiatric disorders such as autism and Alzheimer's, Parkinson's and Huntington's disease. Based on the available research, amyloid beta peptide, tau protein, lactoferrin, alpha-synuclein, DJ-1 protein, chromogranin A, huntingtin protein, DNA methylation disruptions, and micro-RNA profiles display a reliable degree of consistency and validity as disease biomarkers.
The huge amounts of data generated by media sensors in health monitoring systems, by medical diagnosis that produce media (audio, video, image, and text) content, and from health service providers are too complex and voluminous to be processed and analyzed by traditional methods. Data mining approaches offer the methodology and technology to transform these heterogeneous data into meaningful information for decision making. This paper studies data mining applications in healthcare. Mainly, we study k-means clustering algorithms on large datasets and present an enhancement to k-means clustering, which requires k or a lesser number of passes to a dataset. The proposed algorithm, which we call G-means, utilizes a greedy approach to produce the preliminary centroids and then takes k or lesser passes over the dataset to adjust these center points. Our experimental results, which were used in an increasing manner on the same dataset, show that G-means outperforms k-means in terms of entropy and F-scores. The experiments also yield better results for G-means in terms of the coefficient of variance and the execution time.
Purpose
The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules.
Design/methodology/approach
The proposed model uses an association rule algorithm for extracting frequent sets containing related items – to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together.
Findings
The proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works.
Research limitations/implications
The stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results.
Originality/value
In this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.