RiBoSOM

Yang, Yu; Stathis, Dimitrios; Sharma, Prashant; Paul, Kolin; Hemani, Ahmed; Grabherr, Manfred; Ahmad, Rafi

doi:10.1145/3229631.3229650

Cited by 13 publications

(6 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3) CGRA fabric: We compare the SOM implementation on GAP9 with a CGRA that targets dense linear algebra applications, including SOM [3]. The authors employed two CGRA fabrics, namely Dynamically Reconfigurable Resource Array (DRRA) [20] for dense linear algebra and Distributed Memory Architecture (DiMArch) [21] for streaming scratchpad memory, connected through a configuration network-onchip (NoC), as described in [3]. Each DRRA cell includes a 16-bit fixed-point arithmetic Data Processing Unit (DPU), a register file, and a sequencer responsible for cell configuration.…”

Section: Resultsmentioning

confidence: 99%

“…Leveraging hardware acceleration is an alternative approach to enhance the efficiency of computationally intensive algorithms. Previous research conducted by [3] has proposed the utilization of SOM for accelerating genome identification processes. The CGRA implementation [9] observed a less than 1% quality loss when training SOM networks using 16-bit fixed-point number representations, compared to 32-bit FP implementation.…”

Section: Related Workmentioning

confidence: 99%

“…In our context, SOMs encapsulate a compressed representation of genomic data, able to distinguish between different pathogens without processing the entire sampled DNA data. For instance, the algorithm proposed in [3] uses 40k random fragments of the DNA sequence of two strains of E. Coli bacteria to train two SOM networks to classify subsequent sequences of the bacterial strains. One of the core benefits of this approach is its ability to work with fragments of the DNA sequence instead of requiring fully assembled DNA, as has been attempted by [4] in the past.…”

Section: Introductionmentioning

confidence: 99%

“…In contrast, the CGRA-based implementation achieves a speedup of 11.05×. Additionally, the FP8 implementation exhibits a lower energy consumption than the 16-bit CGRA implementation [3], thanks to dedicated algorithmic optimizations. We achieve a remarkable 6.72× improvement in energy efficiency compared to FP32, while the CGRA achieves a 3.15× improvement in large SOM networks.…”

Section: Introductionmentioning

confidence: 99%

“…This section overviews the implemented SOM algorithm, the PULP platform, and the smallFloats data types. This paper uses a circular SOM for bacterial genome recognition described in [3]. Algorithm 1 summarizes the main computational steps.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms

Mirsalari,

Yousefzadeh,

Tagliavini

et al. 2023

2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS)

Self Cite

View full text Add to dashboard Cite

Pathogenic bacteria significantly threaten human health, highlighting the need for precise and efficient methods for swiftly identifying bacterial species. This paper addresses the challenges associated with performing genomics computations for pathogen identification on embedded systems with limited computational power. We propose an optimized implementation of Self-Organizing Maps (SOMs) targeting a parallel ultra-lowpower platform based on the RISC-V instruction set architecture. We propose two mapping methods for implementing the SOM algorithm on a parallel cluster, coupled with software techniques to improve the throughput. Orthogonally to parallelization, we investigate the impact of smaller-than-32-bit floating-point formats (smallFloats) on energy savings, precision, and performance. Our experimental results show that all smallFloat formats exhibit a 100% classification accuracy. The parallel variants achieve a speed-up of 1.98×, 3.79×, and 6.83× on 2, 4, and 8 cores, respectively. Comparing our design with a 16-bit fixed-point implementation on a coarse grain reconfigurable architecture (CGRA), the FP8 implementation achieves, on average, 1.42× energy efficiency, 1.51× speedup, and a 50% reduction in memory footprint compared to CGRA. Furthermore, FP8 vectorization increases the average speed-up by 2.5×.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms

Mirsalari,

Yousefzadeh,

Tagliavini

et al. 2023

2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS)

Self Cite

View full text Add to dashboard Cite

show abstract

eBrainII: a 3 kW Realtime Custom 3D DRAM Integrated ASIC Implementation of a Biologically Plausible Model of a Human Scale Cortex

Stathis

Sudarshan

Yang

et al. 2020

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

The Artificial Neural Networks (ANNs), like CNN/DNN and LSTM, are not biologically plausible. Despite their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, e.g., cortex, basal ganglia, and amygdala, have a greater potential to achieve biological brain like cognitive capabilities. Bayesian Confidence Propagation Neural Network (BCPNN) is a biologically plausible spiking model of the cortex. A human-scale model of BCPNN in real-time requires 162 TFlop/s, 50 TBs of synaptic weight storage to be accessed with a bandwidth of 200 TBs. The spiking bandwidth is relatively modest at 250 GBs/s. A hand-optimized implementation of rodent scale BCPNN has been done on Tesla K80 GPUs require 3 kWs, we extrapolate from that a human scale network will require 3 MWs. These power numbers rule out such implementations for field deployment as cognition engines in embedded systems. The key innovation that this paper reports is that it is feasible and affordable to implement real-time BCPNN as a custom tiled application-specific integrated circuit (ASIC) in 28 nm technology with custom 3D DRAM - eBrainII - that consumes 3 kW for human scale and 12 watts for rodent scale. Such implementations eminently fulfill the demands for field deployment.

show abstract

AMR-Diag: Neural network based genotype-to-phenotype prediction of resistance towards β-lactams in Escherichia coli and Klebsiella pneumoniae

Avershina

Sharma

Taxt

et al. 2021

Computational and Structural Biotechnology Journal

Self Cite

View full text Add to dashboard Cite

show abstract

RiBoSOM

Cited by 13 publications

References 34 publications

Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms

Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms

eBrainII: a 3 kW Realtime Custom 3D DRAM Integrated ASIC Implementation of a Biologically Plausible Model of a Human Scale Cortex

AMR-Diag: Neural network based genotype-to-phenotype prediction of resistance towards β-lactams in Escherichia coli and Klebsiella pneumoniae

Contact Info

Product

Resources

About