Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)
DOI: 10.1109/isca.1998.694790
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Abstract: Modern computer systems extract parallelism from problems at two extremes of granularity: instruction-level parallelism (ILP) and coarse-thread parallelism. VLIW and superscalar processors exploit ILP with a grain size of a single instruction, while multiprocessors extract parallelism from coarse threads with a granularity of many thousands of instructions.The parallelism available at these two extremes is limited. The ILP in applications is restricted by control flow and data dependencies [17], and the hardwa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 44 publications
(30 citation statements)
references
References 15 publications
0
29
0
Order By: Relevance
“…The M-Machine employed an on-chip cluster switch to connect the register bypass networks for three processors; an instruction writing to a remote register injects its result into the switch, which delivers the data to a waiting instruction on a remote processor [6]. The MIT RAW processor took this strategy further, by using a 4x4 mesh network to interconnect its processor tiles between execution units [7].…”
Section: Related Workmentioning
confidence: 99%
“…The M-Machine employed an on-chip cluster switch to connect the register bypass networks for three processors; an instruction writing to a remote register injects its result into the switch, which delivers the data to a waiting instruction on a remote processor [6]. The MIT RAW processor took this strategy further, by using a 4x4 mesh network to interconnect its processor tiles between execution units [7].…”
Section: Related Workmentioning
confidence: 99%
“…Unlike the full/empty bits like fine-grain synchronization [21,3,11,2,17,15], which tags the entire memory of the machine by associating additional access state bits with each word in memory, the design of SSB is motivated by the following observation: at any instance only a small fraction of memory locations is actively participating in synchronization [25].…”
Section: Ssb: Supporting Efficient Fine-grain Synchronization On Manymentioning
confidence: 99%
“…HEP [21], Tera [3], MDP [11], Alewife [18,2], MMachine [17], Cray MTA-2 [1], the MT processor in Eldorado [15], and others associate additional access state bits (e.g., full/empty bits) with each word in entire memory. Fine-grain synchronization is achieved by accessing those word-level state bits in memory.…”
Section: Related Workmentioning
confidence: 99%
“…We applied their model to four specifically synthesized blocks: three units from the MMachine, a fine-grained multicomputer designed at MIT and Stanford [14], and the global placement of Magic, a synthesized controller chip from Stanford's Flash multiprocessor [15] (minus the artificially long hand-routed MiscBus). From Figures 4 and 5 (we only show one M-Machine plot for brevity), we see that the model is a good fit for the wire length distributions of these designs, which span a wide range of gate count.…”
Section: Wire Length Distributionsmentioning
confidence: 99%