Explicit Runge-Kutta methods (RKMs) are among the most popular classes of formulas for the approximate numerical integration of nonstiff, initial value problems. However, high-order Runge-Kutta methods require more function evaluations per integration step than, for example, Adams methods used in PECE mode, and so, with RKMs, it is expecially important to avoid rejected steps. Steps are often rejected when certain derivatives of the solutions are very large for part of the region of integration. This corresponds, for example, to regions where the solution has a sharp front or, in the limit, some derivative of the solution is discontinuous. In these circumstances the assumption that the local truncation error is changing slowly is invalid, and so any step-choosing algorithm is likely to produce an unacceptable step. In this paper we derive a family of explicit Runge-Kutta formulas. Each formula is very efficient for problems with smooth solution as well as problems having rapidly varying solutions. Each member of this family consists of a fifty-order formula that contains imbedded formulas of all orders 1 through 4. By computing solutions at several different orders, it is possible to detect sharp fronts or discontinuities before all the function evaluations defining the full Runge-Kutta step have been computed. We can then either accpet a lower order solution or abort the step, depending on which course of action seems appropriate. The efficiency of the new algorithm is demonstrated on the DETEST test set as well as on some difficult test problems with sharp fronts or discontinuities.
Many metrics are used for measuring the performance of a parallel algorithm running on a parallel processor. This article introduces a new metric that has some advantages over the others. Its use is illustrated with data from the Linpack benchmark report and the winners of the Gordon Bell Award.
Abstract. This paper examines common implementations of linear algebra algorithms, such as matrixvector multiplication, matrix-matrix multiplication and the solution of linear equations. The different versions are examined for efficiency on a computer architecture which uses vector processing and has pipelined instruction execution. By using the advanced architectural features of such machines, one can usually achieve maximum performance, and tremendous improvements in terms of execution speed can be seen over conventional computers.1. Introduction. In this paper we describe why existing algorithms for linear algebra are not usually suited for computers that employ advanced concepts such as pipelining and vector constructs to achieve enhanced performance. We examine the process of refitting or reorganizing an underlying algorithm to conform to the computer architecture, thereby gaining tremendous improvements in execution speeds while sacrificing neither accuracy nor algorithm clarity. This reorganization, where it can be done, is usually conceptually simple at the algorithm level. This paper will not address the issues involved with parallel processing. For a survey of parallel algorithms in linear algebra see the review paper by Heller [8].We will not concern ourselves here with an actual implementation on a specific architecture: To do so, one must understand all the subtlety and nuances of that architecture and risk obscuring the fundamental ideas. Rather, we use the features of a vector pipeline machine to understand how various aspects interrelate and how they can be put together to achieve very high execution rates.We use the term architecture in reference to the organization of the computer as seen by the programmer or algorithm designer. Within the architecture we focus on the instruction set and memory references, and their interaction in terms of performance.We will concentrate our examination on the behavior of linear algebra algorithms for dense problems that can be accommodated in the main memory of a computer. The solutions proposed here do not, in general, carry over to sparse matrices because of the short vector lengths and the indirect addressing schemes that are so prevalent in sparse matrix calculations. For a discussion of methods for handling the sparse matrix case, see [5], [7].We will focus in particular on algorithms written in Fortran and assembly language. Fortran is an appropriate language, given the scientific nature of the application; occasional use of assembly language enables us to gain the maximum speed possible. The
Many v ersions of the fast Fourier transform require a reordering of either the input or the output data that corresponds to reversing the order of the bits in the array index. There has been a surprisingly large number of papers on this subject in the recent literature. This paper collects 30 methods for bit reversing an array. E a c h m e t h o d w as recoded into a uniform style in Fortran and its performance measured on several di erent m a c hines, each w i t h a di erent memory system. This paper includes a description of how the memories of the machines operate to motivate two new algorithms that perform substantially better than the others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.