Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

Chen, Xi E.; Aamodt, Tor M.

doi:10.1109/micro.2008.4771779

Cited by 21 publications

(13 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the formula 11, the FFT algorithm after adopting the mapping method doesn't improve a lot when the processing length is shorter than cache capacity. However if it is longer, the improvement is obvious [14] . To verify the effectiveness of this method, we will implement the radix-2…”

Section: Effective Fft Mapping Methods Based On Superscalar Processormentioning

confidence: 99%

“…After data partitioning, the problem how to achieve data should be considered. In practical application, the data always be huge, thus it usually be stored in external storage space of DSP (usually SDRAM) [14] . Therefore we can use DMA of DSP processor to access data from SDRAM according to the partitioning method showed in formula 12, and remove it to on-chip memory.…”

Section: An Effective Mapping Methods For Fft Based On the Ts201mentioning

confidence: 99%

See 1 more Smart Citation

An efficient FFT-mapping method based on cache optimization

Zhu

Liu

Gao

et al. 2015

IET International Radar Conference 2015

View full text Add to dashboard Cite

Fast Fourier Transform (FFT) is an important technology in real-time signal processing system, which means the efficiency of FFT algorithm mapping to hardware system has very important significance. At first, we aim at the FFT execution process on processor. And then analyze the memory access process based on cache mechanism, and get that the cache hit directly affects FFT execution time. Hence, an efficient mapping method is come up with, which splits long FFT into multiple segments to make sure every segment is shorter than the cache capacity. Accordingly, the cache hit rate will be improved, correspondingly, the execution efficiency will be better finally. In the end, the new method is experimented on the ADI's TS201 digital signal processor, and the result shows that the execution time of FFT is improved greatly.

show abstract

Section: Effective Fft Mapping Methods Based On Superscalar Processormentioning

confidence: 99%

Section: An Effective Mapping Methods For Fft Based On the Ts201mentioning

confidence: 99%

An efficient FFT-mapping method based on cache optimization

Zhu

Liu

Gao

et al. 2015

IET International Radar Conference 2015

View full text Add to dashboard Cite

show abstract

“…Karkhanis and Smith described a "first-order" performance model [11], which was later refined [6,2,5]. Instructions are (quickly) processed one by one to obtain certain statistics, like the CPI in the absence of miss events, the number of branch mispredictions, the number of non-overlapped long data cache misses, and so on.…”

Section: Structural Core Modelsmentioning

confidence: 99%

“…A practical use of the BADCO methodology may use sampling to obtain a representative set of traces [25]. 2 We used SimpleScalar EIO tracing feature [1], which is included in the Zesto simulation package. Other known methods for reproducible simulations include for instance System-Inria the same sequence of instructions.…”

Section: Trace Generationmentioning

confidence: 99%

BADCO: Behavioral Application-Dependent Superscalar Core Models

Velásquez

Michaud

Seznec

2013

Int J Parallel Prog

View full text Add to dashboard Cite

International audienceMicroarchitecture research and development rely heavily on simulators. The ideal simulator should be simple and easy to develop, it should be precise, accurate and very fast. But the ideal simulator does not exist, and microarchitects use different sorts of simulators at different stages of the development of a processor, depending on which is most important, accuracy or simulation speed. Approximate microarchitecture models, which trade accuracy for simulation speed, are very useful for research and design space exploration, provided the loss of accuracy remains acceptable. Behavioral superscalar core modeling is a possible way to trade accuracy for simulation speed in situations where the focus of the study is not the core itself. In this approach, a superscalar core is viewed as a black box emitting requests to the uncore at certain times. A behavioral core model can be connected to a detailed uncore model. Behavioral core models are built from detailed simulations. Once the time to build the model is amortized, important simulation speedups can be obtained. We describe and study a new method for defining behavioral models for modern superscalar cores. The proposed Behavioral Application-Dependent Superscalar Core model, BADCO, predicts the execution time of a thread running on a superscalar core with an error less than 10% in most cases. We show that BADCO is qualitatively accurate, being able to predict how performance changes when we change the uncore. The simulation speedups we obtained are typically between one and two orders of magnitude

show abstract

“…Karkhanis and Smith [104] use the interval model to explore the processor design space automatically and identify processor configurations that represent Pareto-optimal design points with respect to performance, energy and chip area for a particular application or set of applications. Chen and Aamodt [27] extend the interval model by proposing ways to include hardware prefetching and account for a limited number of miss status handling registers (MSHRs). Hong and Kim [84] present a first-order model for GPUs which shares some commonalities with the interval model described here.…”

Section: Follow-on Workmentioning

confidence: 99%