AbstractIdentification of patients with life-threatening diseases including leukemias or infections such as tuberculosis and COVID-19 is an important goal of precision medicine. We recently illustrated that leukemia patients are identified by machine learning (ML) based on their blood transcriptomes. However, there is an increasing divide between what is technically possible and what is allowed because of privacy legislation. To facilitate integration of any omics data from any data owner world-wide without violating privacy laws, we here introduce Swarm Learning (SL), a decentralized machine learning approach uniting edge computing, blockchain-based peer-to-peer networking and coordination as well as privacy protection without the need for a central coordinator thereby going beyond federated learning. Using more than 14,000 blood transcriptomes derived from over 100 individual studies with non-uniform distribution of cases and controls and significant study biases, we illustrate the feasibility of SL to develop disease classifiers based on distributed data for COVID-19, tuberculosis or leukemias that outperform those developed at individual sites. Still, SL completely protects local privacy regulations by design. We propose this approach to noticeably accelerate the introduction of precision medicine.
Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's directionoptimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 2 34 vertices and 2 38 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a sharedmemory system on the current Graph500 list presented in November 2015, which includes our previous implementation.
In the mid 1980s, a typical large data visualization, involved the input of geometry in the order of 10,000 triangles, for rendering into about a million display pixels. However, today, we see input data sets growing to a billion triangles, while the generated display has only grown to about 10 million pixels. Consequently, large data visualization have, over the years, changed from a data expansion, to a data compression function.This four orders of magnitude change has evoked renewed interests in ray tracing rendering techniques, of which its performance can be made more input geometry-quantity insensitive. This is as opposed to scanline rendering techniques, such as OpenGL, that can be made more output pixel-quantity insensitive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.