Richard Neves scite author profile

Richard Neves

5Publications

20Citation Statements Received

23Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Colorado Boulder, IBM Research - Thomas J. Watson Research Center, University of Colorado System

Publications

Order By: Most citations

Whole-program optimization for time and space efficient threads

Grunwald

Neves

1996

View full text Add to dashboard Cite

U n i v e r s i t y of Colorado C a m p u s Box 430, Boulder, C O 8 0 3 0 9 -0 4 3 0 g r u n w a l d @ c s , c o l o r a d o , e d u Richard Neves RO. B o x 218 IBM. T. J. W a t s o n Research Yorktown Heights, N Y 10598r n e v e s @ w a t son. ibm. c o m AbstractModem languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication latency, either with hardwares devices or users. Client-server applications typically use threads to simplify the complex controlflow that arises when multiple clients are used. Recently, the scientific computing community has started using threads to mask network communication latency in massively parallel architectures, allowing computation and communication to be overlapped. Lastly, some architectures implement threads in hardware, using those threads to tolerate memory latency.In general, it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible, or to simplify the program design. However, threads incur time and space overheads, and programmers often compromise simple designs for performance. In this paper, we show how to reduce time and space thread overhead using control flow and register liveness information inferred after compilation. Our techniques work on binaries, are not specific to a particular compiler or thread library and reduce the the overall execution time of fine-grain threaded programs by 1 5 -30%. We use execution-driven analysis and an instrumented operating system to show why the execution time is reduced and to indicate areas for future work.Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or d stributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice s g van that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ASPLOS VII 10/96 MA, USA © 1996 ACM 0-89791-767-7/96/0010...$3.50 are primarily interested in scientific applications, the techniques we describe may be applicable to a wide range of application domains including threaded databases, client-server applications, and inkernel operating system threads.

show abstract

The DINO User's Manual

Derby¹,

Eskow²,

Neves³

et al. 1990

View full text Add to dashboard Cite

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

show abstract

Efficient compile-time/run-time contraction of fine grain data parallel codes

Neves

Schnabel

1994

View full text Add to dashboard Cite

Threaded Runtime Support for Execution of Fine Grain Parallel Code on Coarse Grain Multiprocessors

Neves

Schnabel

1997

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Runtime support for execution of fine grain parallel code on coarse grain multiprocessors

Neves

Schnabel

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Richard Neves

Whole-program optimization for time and space efficient threads

The DINO User's Manual

Efficient compile-time/run-time contraction of fine grain data parallel codes

Threaded Runtime Support for Execution of Fine Grain Parallel Code on Coarse Grain Multiprocessors

Runtime support for execution of fine grain parallel code on coarse grain multiprocessors

Contact Info

Product

Resources

About