We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two current variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two variants of Naive Bayes learning, SpamAssassin and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surprisingly, we found that the performance of these systems was remarkably similar and that the extended systems have significant weaknesses which are not apparent for the simpler Naive Bayes learner. The simpler Naive Bayes learner, SpamBayes, also offers the most stable performance in that it deteriorates least over time. Overall, SpamBayes should be preferred over the more complex variants.
In colorectal cancer (CRC), an increase in the stromal (S) area with the reduction of the epithelial (E) parts has been suggested as an indication of tumor progression. Therefore, an automated image method capable of discriminating E and S areas would allow an improved diagnosis. Immunofluorescence staining was performed on paraffin-embedded sections from colorectal tumors (16 samples from patients with liver metastasis and 18 without). Noncancerous tumor adjacent mucosa (n = 5) and normal mucosa (n = 4) were taken as controls. Epithelial cells were identified by an anti-keratin 8 (K8) antibody. Large tissue areas (5–63 mm2/slide) including tumor center, tumor front, and adjacent mucosa were scanned using an automated microscopy system (TissueFAXS). With our newly developed algorithms, we showed that there is more K8-immunoreactive E in the tumor center than in tumor adjacent and normal mucosa. Comparing patients with and without metastasis, the E/S ratio decreased by 20% in the tumor center and by 40% at tumor front in metastatic samples. The reduction of E might be due to a more aggressive phenotype in metastasis patients. The novel software allowed a detailed morphometric analysis of cancer tissue compartments as tools for objective quantitative measurements, reduced analysis time, and increased reproducibility of the data.
Isogenic populations of animals still show a surprisingly large amount of phenotypic variation between individuals. Using a GFP reporter that has been shown to predict longevity and resistance to stress in isogenic populations of the nematode Caenorhabditis elegans, we examined residual variation in expression of this GFP reporter. We found that when we separated the populations into brightest 3% and dimmest 3% we also saw variation in relative expression patterns that distinguished the bright and dim worms. Using a novel image processing method which is capable of directly analyzing worm images, we found that bright worms (after normalization to remove variation between bright and dim worms) had expression patterns that correlated with other bright worms but that dim worms fell into two distinct expression patterns. We have analysed a small set of worms with confocal microscopy to validate these findings, and found that the activity loci in these clusters are caused by extremely bright intestine cells. We also found that the vast majority of the fluorescent signal for all worms came from intestinal cells as well, which may indicate that the activity of intestinal cells is responsible for the observed patterns. Phenotypic variation in C. elegans is still not well understood but our proposed novel method to analyze complex expression patterns offers a way to enable a better understanding.
We develop and discuss automated and self-adaptive systems for detecting and classifying botnets based on machine learning techniques and integration of human expertise. The proposed concept is purely passive and is based on analyzing information collected at three levels: (i) the payload of single packets received, (ii) observed access patterns to the darknet at the level of network traffic, and (iii) observed contents of TCP/IP traffic at the protocol level.We illustrate experiments based on real-life data collected with a darknet set up for this purpose to show the potential of the proposed concept for (i) and (ii). In (iii) we use a small spamtrap as darknets cannot capture TCP/IP traffic data, so this experiment is not a purely passive approach, but traffic moving through a network could be analyzed in a similar way to obtain a purely passive system for this step as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.