BackgroundDNA Clustering is an important technology to automatically find the inherent relationships on a large scale of DNA sequences. But the DNA clustering quality can still be improved greatly. The DNA sequences similarity metric is one of the key points of clustering. The alignment-free methodology is a very popular way to calculate DNA sequence similarity. It normally converts a sequence into a feature space based on words’ probability distribution rather than directly matches strings. Existing alignment-free models, e.g. k-tuple, merely employ word frequency information and ignore many types of useful information contained in the DNA sequence, such as classifications of nucleotide bases, position and the like. It is believed that the better data mining results can be achieved with compounded information. Therefore, we present a new alignment-free model that employs compounded information to improve the DNA clustering quality.ResultsThis paper proposes a Category-Position-Frequency (CPF) model, which utilizes the word frequency, position and classification information of nucleotide bases from DNA sequences. The CPF model converts a DNA sequence into three sequences according to the categories of nucleotide bases, and then yields a 12-dimension feature vector. The feature values are computed by an entropy based model that takes both local word frequency and position information into account. We conduct DNA clustering experiments on several datasets and compare with some mainstream alignment-free models for evaluation, including k-tuple, DMk, TSM, AMI and CV. The experiments show that CPF model is superior to other models in terms of the clustering results and optimal settings.ConclusionsThe following conclusions can be drawn from the experiments. (1) The hybrid information model is better than the model based on word frequency only. (2) For DNA sequences no more than 5000 characters, the preferred size of sliding windows for CPF is two which provides a great advantage to promote system performance. (3) The CPF model is able to obtain an efficient stable performance and broad generalization.
The extensive application of plastic-film mulching (PFM) has brought a series of environmental pollution due to the lack of awareness of plastic-film rational use and absence of plastic residues recycling in China. In addition, the use of degradable film instead of common polyethylene plastic film (PE film) can effectively alleviate this situation. The substitution of PE film with biodegradable film in the agricultural production of processed tomato in Xinjiang region was investigated in this study. Using bare soil as the control, we compared the effects of PE film and biodegradable film mulching on crop growth, yield, and economic benefits in processed tomato. The results indicated that: (1) Biodegradable film with a thickness of about 8 μm can meet the mechanical operation requirements, and the effect of biodegradable film mulching was completely consistent with that of PE film; (2) Four kinds of biodegradable film can meet the requirements of processed tomato growth and development, although slightly different from PE film in increasing temperature and water retention; (3) Plastic-film planting can ensure a net profit of 1.14–1.64 ten thousand CNY per hectare under the current production conditions and mode of Xinjiang region, and biodegradable film planting was observed to be essentially equal to those of PE film; (4) Nearly 50%–70% of the biodegradable film was ruptured and degraded during processed tomato harvesting, which avoided the occurrence of the winch of the plastic-film winding harvester and improves the efficiency and commodity rate of the processed tomato harvest operation. As the biodegradable film mulching causes no residual pollution, it is accepted to be an alternative to plastic-film mulching for agricultural applications and supports the sustainable development of agroecosystems in Xinjiang region.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.