Custom-Instruction Synthesis for Extensible-Processor Platforms

Sun, Fei; Ravi, S.; Raghunathan, Anand; Jha, Niraj K.

doi:10.1109/tcad.2003.822133

Cited by 88 publications

(49 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For both benchmarks the cheapest memory organization (8x8 clustered RF) had a feasible layout, which LICCA discovered; the runtime of our CAD flow was approximately one minute; this is significantly faster than the runtime required to find an 8-input 8-output ISE using most existing techniques [3,9,12]; this runtime is comparable to the runtime of the ISE identification method of Verma et al [13], which, to the best of our knowledge, is the fastest optimal algorithm published to date for this problem.…”

Section: Runtimementioning

confidence: 88%

Memory organization and data layout for instruction set extensions with architecturally visible storage

Athanasopoulos

Brisk

Leblebici

et al. 2009

Proceedings of the 2009 International Conference on Computer-Aided Design

View full text Add to dashboard Cite

Present application specific embedded systems tend to choose instruction set extensions (ISEs) based on limitations imposed by the available data bandwidth to custom functional units (CFUs). Adoption of the optimal ISE for an application would, in many cases, impose formidable cost increase in order to achieve the required data bandwidth. In this paper we propose a novel methodology for laying out data in memories, generating highbandwidth memory systems by making use of existing lowbandwidth low-cost ones and designing custom functional units all with the desirable data bandwidth for only a fraction of the additional cost required by traditional techniques.

show abstract

Section: Runtimementioning

confidence: 88%

Memory organization and data layout for instruction set extensions with architecturally visible storage

Athanasopoulos

Brisk

Leblebici

et al. 2009

Proceedings of the 2009 International Conference on Computer-Aided Design

View full text Add to dashboard Cite

show abstract

“…Therefore, variations of graph partitioning algorithms may be investigated to transform monolithic SI graphs into modular SIs and to determine, which properties modular SIs demand. This can be used to modify the 'pruning' step in state-of-the-art automatic SI detection (see for instance [SRRJ04,VBI07]). To exploit the feature to share Atoms between different SIs, techniques like data-path merging [BKS04] may be adapted to identify reusable Atoms.…”

Section: Future Workmentioning

confidence: 99%

RISPP: A run-time adaptive reconfigurable embedded processor

Bauer

Shafique

Henkel

2009

2009 International Conference on Field Programmable Logic and Applications

View full text Add to dashboard Cite

Hiermit erkläre ich an Eides statt, dass ich die von mir vorgelegte Arbeit selbständig verfasst habe, dass ich die verwendeten Quellen, Internet-Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit -einschließlich Tabellen, Karten und Abbildungen -die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. ____________________________________ L a r s B a u e r i AcknowledgementsI want to thank my advisor Prof. Jörg Henkel for the inspirations, discussions, and opportunities he provided and shared with me. He managed to guide and to challenge me while giving me all freedom to follow my ideas and interests. Working with him was a nice experience and it definitely had a strong influence on my independent approach to work.I also want to thank all colleagues from the Chair for Embedded Systems for the nice discussions and the good time. In the last two month before submitting my thesis, especially Thomas Ebi and Sebastian Kobbe provided consistent support by helping me managing the daily workload and by sharing their coffee machines, which I cannot appreciate enough. Additionally, it is especially due to the secretaries and technicians that we can research in a good working environment and I explicitly want to acknowledge their work during all the time.Special thanks go to my colleague and room mate Muhammad Shafique. Without him, the work would not have been what it became. The technical discussions on application-and architecture-aspects improved the quality of this work more than once. I also want to thank the Master students that I supervised in the scope of this thesis.It was a nice experience to collaborate with colleagues from the groups (in alphabetical order) of Prof. AbstractReconfigurable embedded processors are a special class of processors comprising an extended instruction set that is implemented using a reconfigurable fabric. The instruction-set extension is typically application specific, but it is not required to finalize it when designing the processor. The reconfigurable fabric (e.g. a field-programmable gate array (FPGA)) allows that the accelerators that are used to implement the instruction-set extension may be reconfigured during run time without affecting the functionality of the working processor. Therefore, the accelerators -and thus the instruction-set extension -may be adapted according to the requirements of a running application.State-of-the-art reconfigurable processors require that the application programmer (or compiler) determines during compile time 'which' reconfigurations shall be performed and 'when' they shall be performed, i.e. which accelerators shall be loaded to a particular part of the reconfigurable fabric at a certain time. The problem is that it is typically not known during compile time which applications execute at the same time (i.e. in a multi-tasking environment), demanding the reconfigurable fabric. Additionally, it is not necessa...

show abstract

“…Such instructions are able to do the work of multiple instructions of a general-purpose processor. Extended instructions include fusion instructions, (21) SIMD/vector instructions and FLIX (22) instructions. Flexible Length Instruction Xtensions (FLIX) are VLIW-like instructions whereby multiple operations can be performed in a single instruction.…”

Section: Baseline Processor Descriptionmentioning

confidence: 99%

Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG

Shee

Erdos

Parameswaran

2007

Int J Parallel Prog

View full text Add to dashboard Cite

Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive realtime systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica's Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower * National ICT Australia is funded through the Australian Government's Backing Australia's Ability initiative, in part through the Australian Research Council. granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the masterslave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6× while with an area increase of only 3.1× (area increase to speedup ratio of just 0.68).

show abstract

Custom-Instruction Synthesis for Extensible-Processor Platforms

Cited by 88 publications

References 37 publications

Memory organization and data layout for instruction set extensions with architecturally visible storage

Memory organization and data layout for instruction set extensions with architecturally visible storage

RISPP: A run-time adaptive reconfigurable embedded processor

Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG

Contact Info

Product

Resources

About