Single instruction, multiple threads

From Infogalactic: the planetary knowledge core
(Redirected from SIMD lane)
Jump to: navigation, search

Single instruction, multiple thread (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multi threading.

The processors, say a number p of them, seem to execute many more than p tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to SIMD "lanes".[1]

The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs.

SIMT was introduced by Nvidia:[2][3]

<templatestyles src="Template:Blockquote/styles.css" />

[Nvidia's Tesla GPU microarchitecture] (first available November 8, 2006 as implemented in the "G80" GPU chip) introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction.

ATI Technologies (now AMD) released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip.

As access time of all the widespread RAM types (e.g. DDR SDRAM, GDDR SDRAM, XDR DRAM, etc.) is still very low, some engineers came up with the idea to hide that latency, that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs... this might or might not be considered to be a property of 'SIMT' itself.

SIMT is intended to limit instruction fetching overhead,[4] i.e. the latency that comes with memory access, and is used in modern GPUs (including, but not limited to those of Nvidia and AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to multithreading in CPUs (not to be confused with multi-core).[5]

A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor utilization where a processor's threads follow different control-flow paths. For instance, to handle an IF-ELSE block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.[6]

Nvidia CUDA AMD OpenCL Henn&Pratt
Thread Work-item Sequence of SIMD Lane operations
Warp Wavefront Thread of SIMD Instructions
Block Workgroup Body of vectorized loop
Grid NDRange Vectorized loop

See also

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. Lua error in package.lua at line 80: module 'strict' not found.


<templatestyles src="Asbox/styles.css"></templatestyles>