Non-coding variants in the human genome greatly influence some traits and complex diseases by their own regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in the human non-coding sequences. However, it is difficult for users with insufficient knowledge about the performances of computational methods to select appropriate computational methods from dozens of methods. In order to solve this problem, we assessed 12 performance measures of 24 methods on four independent non-coding variant benchmark datasets: (Ⅰ) rare germline variant from ClinVar, (Ⅱ) rare somatic variant from COSMIC, (Ⅲ) common regulatory variant dataset, and (Ⅳ) disease associated common variant dataset. All 24 tested methods performed differently under various conditions, indicating that these methods have varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable in the rare germline variant from ClinVar with area under curves (AUCs) of 0.4481 - 0.8033 and poor in the rare somatic variant from COSMIC (AUCs: 0.4984 - 0.7131), common regulatory variant dataset (AUCs: 0.4837 - 0.6472), and disease associated common variant dataset (AUCs: 0.4766 -0.5188). We also compared the prediction performance among 24 methods for non-coding de novo mutations in autism spectrum disorder and found that the CADD and CDTS methods showed better performance. Summarily, we assessed the performances of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and new method development in interpreting non-coding variants.
Rank aggregation aims to combine multiple rank lists into a single one, which has wide applications in recommender systems, link prediction, metasearch, proposal selection, and so on. Some existing studies have summarized and compared different rank aggregation algorithms. However, most of them cover only a few algorithms, the data used to test algorithms do not have a clear statistical property, and the metric used to quantify the aggregated results has certain limitations. Moreover, different algorithms all claim to be superior to existing ones when proposed, the baseline algorithms, the testing samples, and the application scenario are all different from case to case. Therefore, it is still unclear which algorithm is better for a particular task. Here we review nine rank aggregation algorithms and compare their performances in aggregating a small number of long rank lists. We assume an algorithm to generate different types of rank lists with known statistical properties and cause a more reliable metric to quantify the aggregation results. We find that despite the simplicity of heuristic algorithms, they work pretty well when the rank lists are full and have high similarities. In some cases, they can reach or even surpass the optimization-based algorithms in performance. The number of ties in the list will reduce the quality of the consensus rank and increase fluctuations. The quality of aggregated rank changes non-monotonically with the number of rank lists that need to be combined. Overall, the algorithm FAST outperforms all others in three different rank types, which can sufficiently complete the task of aggregating a small number of long rank lists.
The contagion models of disease-spread which predict the epidemics grow with time goes by have been widely researched in social networks. The discrete-time simulation method, Monte Carlo Simulation where time is discretized into uniform steps and transition rates between states are replaced by transition probabilities, are mostly applied when simulating the models. In this paper, we propose a continuous-time approach, the Gillespie algorithm, which can be used for fast simulation of stochastic processes, is event-driven rather than using equally-spaced time steps. We show how the method can be adapted to the epidemic models, mainly in the susceptible-infected model and susceptible-infected-susceptible model, and confirm the accuracy of the method with numerical simulations. Based on the accuracy of the method, we make some changes in epidemic models to make the models more applicable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.