2004
DOI: 10.1109/tcad.2003.822133
|View full text |Cite
|
Sign up to set email alerts
|

Custom-Instruction Synthesis for Extensible-Processor Platforms

Abstract: Abstract-Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turnaround times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
0
1

Year Published

2007
2007
2014
2014

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 88 publications
(49 citation statements)
references
References 37 publications
0
48
0
1
Order By: Relevance
“…For both benchmarks the cheapest memory organization (8x8 clustered RF) had a feasible layout, which LICCA discovered; the runtime of our CAD flow was approximately one minute; this is significantly faster than the runtime required to find an 8-input 8-output ISE using most existing techniques [3,9,12]; this runtime is comparable to the runtime of the ISE identification method of Verma et al [13], which, to the best of our knowledge, is the fastest optimal algorithm published to date for this problem.…”
Section: Runtimementioning
confidence: 88%
“…For both benchmarks the cheapest memory organization (8x8 clustered RF) had a feasible layout, which LICCA discovered; the runtime of our CAD flow was approximately one minute; this is significantly faster than the runtime required to find an 8-input 8-output ISE using most existing techniques [3,9,12]; this runtime is comparable to the runtime of the ISE identification method of Verma et al [13], which, to the best of our knowledge, is the fastest optimal algorithm published to date for this problem.…”
Section: Runtimementioning
confidence: 88%
“…Therefore, variations of graph partitioning algorithms may be investigated to transform monolithic SI graphs into modular SIs and to determine, which properties modular SIs demand. This can be used to modify the 'pruning' step in state-of-the-art automatic SI detection (see for instance [SRRJ04,VBI07]). To exploit the feature to share Atoms between different SIs, techniques like data-path merging [BKS04] may be adapted to identify reusable Atoms.…”
Section: Future Workmentioning
confidence: 99%
“…Such instructions are able to do the work of multiple instructions of a general-purpose processor. Extended instructions include fusion instructions, (21) SIMD/vector instructions and FLIX (22) instructions. Flexible Length Instruction Xtensions (FLIX) are VLIW-like instructions whereby multiple operations can be performed in a single instruction.…”
Section: Baseline Processor Descriptionmentioning
confidence: 99%