We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases L1 hit rates and reduces off‐chip memory energy substantially through better management of off‐chip memory access patterns. To evaluate this model, we augment our architectural simulator with a detailed memory system simulation that includes accurate control, timing and power models for memory controllers and off‐chip dynamic random‐access memory . These details change the results significantly over previous simulations that used a simpler model of off‐chip memory, indicating that this type of memory system simulation is important for realistic simulations that involve external memory. Secondly, we employ reconfigurable special‐purpose pipelines that are constructed dynamically under program control. These pipelines use shared execution units that can be configured to support the common compute kernels that are the foundation of the ray tracing algorithm. This reduces the overhead incurred by on‐chip memory and register accesses. These two synergistic features yield a ray tracing architecture that reduces energy by optimizing both on‐chip and off‐chip memory activity when compared to a more traditional approach.