Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores

Song, Fengguang; Dongarra, Jack

doi:10.1145/2597652.2597670

Cited by 11 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even if the number of arithmetic operations is reduced by 100×, the overhead of lookups and cache misses is so dominant that switching to sparse matrices would not pay off. The gap is widened even further by the use of steadily improving, highly tuned, numerical libraries that allow for extremely fast dense matrix multiplication, exploiting the minute details of the underlying CPU or GPU hardware [16,9]. Also, non-uniform sparse models require more sophisticated engineering and computing infrastructure.…”

Section: Motivation and High Level Considerationsmentioning

confidence: 99%

Going deeper with convolutions

Szegedy

Liu

Jia³

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

39,016

21,913

View full text Add to dashboard Cite

We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

show abstract

Section: Motivation and High Level Considerationsmentioning

confidence: 99%

Going deeper with convolutions

Szegedy

Liu

Jia³

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

39,016

21,913

View full text Add to dashboard Cite

show abstract

“…Even if the number of arithmetic operations is reduced by 100×, the overhead of lookups and cache misses would dominate: switching to sparse matrices might not pay off. The gap is widened yet further by the use of steadily improving and highly tuned numerical libraries that allow for extremely fast dense matrix multiplication, exploiting the minute details of the underlying CPU or GPU hardware [16,9]. Also, non-uniform sparse models require more sophisticated engineering and computing infrastructure.…”

Section: Motivation and High Level Considerationsmentioning

confidence: 99%

Convolutional neural networks for solving computer vision problems

Зінченко¹

2022

TIT

View full text Add to dashboard Cite

This article provides an overview of the main methods of solving computer vision problems of classification, segmentation and image processing, which are implemented in CV systems. Computer vision systems are programmed to perform highly specialized tasks, capable of detecting objects during identification, reading serial numbers, and searching for surface defects. When applying deep learning methods in CV systems, their processing speed on large data sets and the accuracy of image classification/segmentation are significantly increased. Artificial vision systems are able to identify individual pixels according to the relevant features during processing, provide a high-quality result in pattern recognition, image restoration, and fitting part of the image. Although some computer vision algorithms were developed to simulate visual perception, a larger number of proposed methods are able to fully process images and determine their characteristic properties. The scope of application of CV systems will continue to expand, as the need for artificial intelligence systems is growing rapidly. The purpose of this article is to provide a structured review of computer vision technologies based on their advantages and disadvantages. The work summarizes the types of CV-systems with artificial intelligence according to the spectrum of their applications, highlights the main problematic areas of their research, such as recognition, identification and detection. The article reviews convolutional neural networks (CNNs), which are successfully applied to the analysis of visual images in deep learning. CNN architectures in some cases outperform artificial neural networks in classification tasks by their performance. Currently, convolutional neural networks are the main tool for classification and recognition of objects, faces in photographs, recognition of video and audio materials. This paper provides a comparative analysis of well-known CNN models: LeNet 5, AlexNet, VGGNet, GoogLeNet, ResNet and their effectiveness in CV systems. Approaches to the modeling of architectures of convolutional neural networks are proposed, which will allow, in the future, to solve the problem of classification in tasks for computer vision, thereby increasing their performance, accuracy and quality of processing.

show abstract

“…The intra-region is composed of multiple core groups through intra-region interconnection interfaces. This architecture aims to make full use of the locality of programs and features high performance, high scalability, and flexible physical implementation [3] .…”

Section: Proprietary Cpu: Matrix-2000+mentioning

confidence: 99%

Brief introduction of TianHe exascale prototype system

Wang

Lü

Chen

et al. 2021

Tinshhua Sci. Technol.

View full text Add to dashboard Cite

Facing the challenges of the next generation exascale computing, National University of Defense Technology has developed a prototype system to explore opportunities, solutions, and limits toward the next generation Tianhe system. This paper briefly introduces the prototype system, which is deployed at the National Supercomputer Center in Tianjin and has a theoretical peak performance of 3.15 Pflops. A total of 512 compute nodes are found where each node has three proprietary CPUs called Matrix-2000+. The system memory is 98.3 TB, and the storage is 1.4 PB in total.

show abstract

Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores

Cited by 11 publications

References 20 publications

Going deeper with convolutions

Going deeper with convolutions

Convolutional neural networks for solving computer vision problems

Brief introduction of TianHe exascale prototype system

Contact Info

Product

Resources

About