Current computational predictions of splice sites largely depend on the sequence patterns of known intronic sequence features (ISFs) described in the classical intron definition model (IDM). The computation-oriented IDM (CO-IDM) clearly provides more specific and concrete information for describing intron flanks of splice sites (IFSSs). In the paper, we proposed a novel approach of fuzzy decision trees (FDTs) which utilize 1) weighted ISFs of twelve uni-frame patterns (UFPs) and forty-five multi-frame patterns (MFPs) and 2) gain ratios to improve the performances in identifying an intron. First, we fuzzified extracted features from genomic sequences using membership functions with an unsupervised self-organizing map (SOM) technique. Then, we brought in different viewpoints of globally weighting and crossly referring in generating fuzzy rules which are interpretable and useful for biologists to verify whether a sequence is an intron or not. Finally, the experimental results revealed the effectiveness of the proposed method in improving the identification accuracy. Besides, we also implemented an on-line intronic identifier to infer an unknown genomic sequence.
Keywords Intron identification, fuzzy decision tree, intronic sequence feature, computation-oriented intron definition model, self-organizing mapI.