XETAL-II: A 107 GOPS, 600mW Massively-Parallel Processor for Video Scene Analysis

Abbo, A.A.; Kleihorst, Richard; Choudhary, Vishal; Sevat, L.; Wielage, Paul; Mouy, S.; Heijligers, M.J.M.

doi:10.1109/isscc.2007.373398

Cited by 44 publications

(31 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…2) Feature processor eliminates the throughput bottleneck and increases throughput 36%. 3) The 205GOPS/W power efficiency is 5× better than previous works [2,3] and is achieved by introducing a feature processor, a gatedclock scheme and by reducing memory accesses. and decision processor (DP).…”

mentioning

confidence: 92%

“…However, the precision loss of analog signal processing prevents those solutions from realizing complex algorithms, and they lack flexibility. Vision processors [2,3] realize high GOPS numbers by combining a processor array for parallel operations and a decision processor for other ones. Converting from parallel data in the processor array to scalar in the decision processor creates a throughput bottleneck.…”

mentioning

confidence: 99%

“…An instruction will be executed only when all required resources are available. This simple scheme ensures minimum inter-processor communication to synchronize the three processors and increases throughput 23% compared with tightly-coupled processors [2]. The clocks of unused resources are turned off to reduce power.…”

mentioning

confidence: 99%

“…The DP can also control the program execution of GP and FP. [2]. Two execution units exist in XETAL-II: a processor array (LPA) and a decision processor (GCP).…”

mentioning

confidence: 99%

See 3 more Smart Citations

iVisual: An Intelligent Visual Sensor SoC With 2790 fps CMOS Image Sensor and 205 GOPS/W Vision Processor

Cheng

Lin

et al. 2009

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

Visual sensors combined with video analysis algorithms can enhance applications in surveillance, healthcare, intelligent vehicle control, human-machine interfaces, etc. Hardware solutions exist for video analysis. Analog on-sensor processing solutions [1] feature image sensor integration. However, the precision loss of analog signal processing prevents those solutions from realizing complex algorithms, and they lack flexibility. Vision processors [2,3] realize high GOPS numbers by combining a processor array for parallel operations and a decision processor for other ones. Converting from parallel data in the processor array to scalar in the decision processor creates a throughput bottleneck. Parallel memory accesses also lead to high power consumption. Privacy is a critical issue in setting up visual sensors because of the danger of revealing video data from image sensors or processors. These issues exist with the above solutions because inputting or outputting video data is inevitable.iVisual is characterized as follows: 1) Privacy is protected by integrating 2790fps CMOS Image Sensor, 76.8GOPS vision processor and 1Mb storage. It is a light-in-answer-out SoC, and no video data need to be revealed outside the chip. 2) Feature processor eliminates the throughput bottleneck and increases throughput 36%. 3) The 205GOPS/W power efficiency is 5× better than previous works [2,3] and is achieved by introducing a feature processor, a gatedclock scheme and by reducing memory accesses. and decision processor (DP). GP is a parallel data in, parallel data out processor and controls the bitplane memory. FP is a parallel data-in, scalar-out processor and therefore eliminates the throughput bottleneck of data conversion. The DP processes scalar-in, scalar-out operations, that are usually decision results that further control the program execution of the GP and FP.The CIS is frame-pipelined with GP, FP and DP to increase hardware utilization. The port of bitplane memory is shared by CIS and GP; port collision is automatically handled. The port sharing of bitplane memory reduces SRAM area 64% and die area 16% with average collision probability below 0.1%. GP, FP and DP work concurrently. For each instruction, the availability of required resources is checked, including resources in other processors. An instruction will be executed only when all required resources are available. This simple scheme ensures minimum inter-processor communication to synchronize the three processors and increases throughput 23% compared with tightly-coupled processors [2]. The clocks of unused resources are turned off to reduce power. The GP execution unit is a SIMD processor array with 128 processing elements (PEs). The PE cache lies between the PE array and bitplane memory to reduce memory access 94%, saving 726mW of power. The PE cache itself consumes 134mW. Various bitplane memory access patterns and storage allocation schemes are provided to reduce the program size and increase storage density. To enhance flexibility, each PE is indexed and has...

show abstract

mentioning

confidence: 92%

mentioning

confidence: 99%

mentioning

confidence: 99%

“…The DP can also control the program execution of GP and FP. [2]. Two execution units exist in XETAL-II: a processor array (LPA) and a decision processor (GCP).…”

mentioning

confidence: 99%

See 2 more Smart Citations

iVisual: An Intelligent Visual Sensor SoC With 2790 fps CMOS Image Sensor and 205 GOPS/W Vision Processor

Cheng

Lin

et al. 2009

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

show abstract

“…Massively parallel SIMD processors with a number of PEs were presented to exploit data-level parallelism in a 2-D image array of pixels [3,4]. However, these processors focus on only the low-level image processing operations like image filtering and thus they are not suitable for object-level parallelism, which is essential for the object recognition.…”

Section: Introductionmentioning

confidence: 99%

A 76.8 GB/s 46 mW low-latency network-on-chip for real-time object recognition processor

Kim

Lee

et al. 2008

2008 IEEE Asian Solid-State Circuits Conference

View full text Add to dashboard Cite

Abstract-A 76.8 GB/s 46 mW low-latency network-on-chip (NoC) provides a communication platform for a real-time object recognition processor. The tree-based topology NoC with three crossbar switches is designed for low-latency by adopting dualchannel and adaptive switching. The NoC can be dynamically configured to exploit both data-level and object-level parallelism on the object recognition processor. FLIT-level clock gating and packet-based power management scheme are employed for low power consumption. The NoC is implemented in 0.13µm CMOS process and provides 76.8 GB/s aggregated bandwidth at 400MHz with 2-clock cycle latency while dissipating 46 mW at 1.2 V.

show abstract

Very Large Scale Integration (VLSI) and ASICs

Veendrick¹

2008

Nanometer CMOS ICs

View full text Add to dashboard Cite

XETAL-II: A 107 GOPS, 600mW Massively-Parallel Processor for Video Scene Analysis

Cited by 44 publications

References 4 publications

iVisual: An Intelligent Visual Sensor SoC With 2790 fps CMOS Image Sensor and 205 GOPS/W Vision Processor

iVisual: An Intelligent Visual Sensor SoC With 2790 fps CMOS Image Sensor and 205 GOPS/W Vision Processor

A 76.8 GB/s 46 mW low-latency network-on-chip for real-time object recognition processor

Very Large Scale Integration (VLSI) and ASICs

Contact Info

Product

Resources

About