Bulldozer (microarchitecture)

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
Bulldozer - Family 15h
Produced From late 2011 to present
Common manufacturer(s)
Min. feature size 32 nm
Instruction set x86-64
Socket(s)
Predecessor Family 10h (K10)
Successor Piledriver - Family 15h (2nd-gen)
Core name(s)

The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets.[1][2] Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011 as the successor to the K10 microarchitecture.

Bulldozer is designed from scratch, not a development of earlier processors.[3] The core is specifically aimed at computing products with TDPs of 10 to 125 watts. AMD claims dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores.

The Bulldozer cores support most of the instruction sets implemented by Intel processors available at its introduction (including SSE4.1, SSE4.2, AES, CLMUL, and AVX) as well as new instruction sets proposed by AMD; ABM, XOP, FMA4 and F16C.[4][5]

Overview

According to AMD, Bulldozer-based CPUs are based on GlobalFoundries' 32 nm Silicon on insulator (SOI) process technology and reuses the approach of DEC for multitasking computer performance with the arguments that it, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high units count design that is easily replicated on a chip for performance scaling."[6] In other words, by eliminating some of the "redundant" elements that naturally creep into multicore designs, AMD has hoped to take better advantage of its hardware capabilities, while using less power.

Bulldozer-based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops. The server segment included the dual chip (16-core) Opteron processor codenamed Interlagos (for Socket G34) and single chip (4, 6 or 8 cores) Valencia (for Socket C32), while the Zambezi (4, 6 and 8 cores) targeted desktops on Socket AM3+.[7][8]

Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its K8 processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer clusters, each with 4 pipelines (the fetch/decode stage is shared). Bulldozer also introduced shared L2 cache in the new architecture. AMD calls this design a "Module". A 16-core processor design would feature eight of these "modules",[9] but the operating system will recognize each "module" as two logical cores.

The modular architecture consists of multithreaded shared L2 cache and FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core.[10]

Architecture

Bulldozer core

Block diagram of a complete Bulldozer module, showing 2 integer clusters
Block diagram of a 4 module design with 8 integer clusters
File:Hwloc.png
Memory topology of a Bulldozer server
  • AMD has re-introduced the "Clustered Integer Core" micro-architecture, an architecture developed by DEC in 1996 with the RISC microprocessor Alpha 21264. This technology is informally called CMT (Clustered Multi-Thread) and formally called "module" by AMD. In terms of hardware complexity and functionality, this module is equal to a dual-core processor in its integer power, and to a single-core processor in its floating-point power: for each two integer cores, there is one floating-point core. The floating-point cores are similar to a single core processor that has the SMT ability, which can create a dual-thread processor but with the power of one (each thread shares the resources of the module with the other thread) in terms of floating point performance.
    • A module consists of a coupling of two "conventional" x86 out-of-order processing cores. The processing core shares the early pipeline stages (e.g. L1i, fetch, decode), the FPUs, and the L2 cache with the rest of the module.
  • Each module has the following independent hardware resources:[11][12]
    • 2 MB of L2 cache per module (shared between the two integer clusters in the core)
    • 16 KB 4-way of L1d (way-predicted) per cluster and 2-way 64 KB of L1i per core, one way for each of the two cluster[13][14][15]
    • Two dedicated integer clusters
      - each one consists of two ALU and two AGU which are capable of a total of four independent arithmetic and memory operations per clock and per cluster
      - duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which increases performance in some multi-threaded integer cases
      - the second integer cluster increases the Bulldozer core die by around 12%, which at chip level adds about 5% of total die space[16]
    • Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of the integer cores dispatches AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software
  • All modules present share the L3 cache as well as an Advanced Dual-Channel Memory Sub-System (IMC - Integrated Memory Controller).
  • A module has 213 million transistors in an area of 30.9 mm² (including the 2 MB shared L2 cache) on an Orochi die.[17]

Instruction set extensions

  • Support for Intel's Advanced Vector Extensions (AVX) instruction set, which supports 256-Bit floating point operations, and SSE4.1, SSE4.2, AES, CLMUL, as well as future 128-bit instruction sets proposed by AMD (XOP, FMA4 and F16C),[18] which have the same functionality as the SSE5 instruction set formerly proposed by AMD, but with compatibility to the AVX coding scheme.

Process technology and clock frequency

  • 11-metal layer 32 nm SOI process with implemented first generation GlobalFoundries's High-K Metal Gate (HKMG)
  • Turbo Core 2 performance boost to increase clock frequency up to 500 MHz with all threads active (for most workloads) and up to 1 GHz with the half of the thread active, within the TDP limit.[19]
  • The chip operates at 0.775 to 1.425 V, achieving clock frequencies of 3.6 GHz or more[17]
  • Min-Max TDP: 25 – 140 watts

Cache and memory interface

  • Up to 8 MB of L3 shared among all cores on the same silicon die (8 MB for 4 cores in Desktop segment and 16 MB for 8 cores in the Server segment), divided into four subcaches of 2 MB each, capable of operating at 2.2 GHz at 1.1125 V[17]
  • Native DDR3 memory support up to DDR3-1866[20]
  • Dual Channel DDR3 integrated memory controller for Desktop and Server/Workstation Opteron 42xx "Valencia";[21] Quad Channel DDR3 Integrated Memory Controller[22] for Server/Workstation Opteron 62xx "Interlagos"
  • AMD claims support for two DIMMs of DDR3-1600 per channel. Two DIMMs of DDR3-1866 on a single channel will be down-clocked to 1600.

I/O and socket interface

  • HyperTransport Technology rev. 3.1 (3.20 GHz, 6.4 GT/s, 25.6 GB/s & 16-bit wide link) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
  • Socket AM3+ (AM3r2)
    • 942-pin, DDR3 support only
    • Will retain backward compatibility with Socket AM3 motherboards (as per motherboard manufacturer choice and if BIOS updates are provided[23][24]), however this not officially supported by AMD; AM3+ motherboards will be backward-compatible with AM3 processors.[25]
  • For the server segment, the existing socket G34 (LGA1974) and socket C32 (LGA1207) will be used.

Processors

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Chipset and I/Os for 1st CMT generation

The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011.[26] The FX-4100, FX-6100, FX-8120 and FX-8150 were released in October 2011; with remaining FX series AMD processors released at the end of the first quarter of 2012.

Desktop

Model Modules Frequency Max. turbo L2 cache L3 cache TDP Memory Turbo Core Socket
Half load Full load
FX-8150 4 3.6 GHz 3.9 GHz 4.2 GHz 4 × 2 MB 8 MB 125 W DDR3
1866 MHz
Yes (2.0) AM3+
FX-8120 3.1 GHz 3.4 GHz 4.0 GHz
FX-8100 2.8 GHz 3.1 GHz 3.7 GHz 95 W
FX-6200 3 3.8 GHz 4.0 GHz 4.1 GHz 3 × 2 MB 125 W
FX-6120 3.5 GHz 3.9 GHz 4.1 GHz 95 W
FX-6100 3.3 GHz 3.6 GHz 3.9 GHz
FX-4170 2 4.2 GHz 4.3 GHz 4.3 GHz 2 x 2 MB 125 W
FX-4130 3.8 GHz 3.9 GHz 4.0 GHz 4 MB
FX-4100 3.6 GHz 3.7 GHz 3.8 GHz 8 MB 95 W

Major Sources: CPU-World[27] and Xbit-Labs[28]

There are two series of Bulldozer-based processors for servers: Opteron 4200 series (code named Valencia, with up to four modules) and Opteron 6200 series (code named Interlagos, with up to 8 modules).[29]

Performance

Performance on Linux

On 24 October 2011, the first generation tests done by Phoronix confirmed that the performance of Bulldozer CPU is somewhat less than expected.[30] In many tests the CPU has performed on same level as older generation Phenom 1060T.

The performance later substantially increased, as various compiler optimizations and CPU driver fixes were released.[31][32]

Performance on Windows

The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and being matched or even outperformed by AMD's own Phenom II X6 at lower clock speeds. In highly threaded benchmarks, the FX-8150 performed on par with the Phenom II X6, and the Intel Core i7 2600K, depending on the benchmark. Given the overall more consistent performance of the Intel Core i5 2500K at a lower price, these results left many reviewers underwhelmed. The processor was found to be extremely power-hungry under load, especially when overclocked, compared to Intel's Sandy Bridge.[33][34]

On 13 October 2011, AMD stated on its blog that "there are some in our community who feel the product performance did not meet their expectations", but showed benchmarks on actual applications where it outperformed the Sandy Bridge i7 2600k and AMD X6 1100T.[35]

In January 2012, Microsoft released two hotfixes for Windows 7 and Server 2008 R2 that marginally improve the performance of Bulldozer CPUs by addressing the thread scheduling concerns raised after the release of Bulldozer.[36][37][38]

On 6 March 2012, AMD posted a knowledge base article stating that there was a compatibility problem with FX processors, and certain games on the widely used digital game distribution platform, Steam. AMD stated that they had provided a BIOS update to several motherboard manufacturers (namely: Asus, Gigabyte Technology, MSI, and ASRock) that would fix the problem.[39]

In September 2014, AMD CEO Rory Read conceded the Bulldozer design had not been a "game-changing part", and that AMD had to live with the design for four years.[40]

Overclocking

On 31 August 2011, AMD and a group of well-known overclockers including Brian McLachlan, Sami Mäkinen, Aaron Schradin, and Simon Solotko managed to set a new world record for CPU frequency using the unreleased and overclocked FX-8150 Bulldozer processor. Before that day, the record sat at 8.309 GHz, but the Bulldozer combined with liquid helium cooling reached a new high of 8.429 GHz. The record has since been overtaken at 8.58 GHz by Andre Yang using liquid nitrogen.[41][42] On August 22, 2014 and using an FX-8370, The Stilt from Team Finland achieved a maximum CPU frequency of 8.722 GHz.[43]

Revisions

Piledriver is the AMD codename for its improved second-generation microarchitecture based on Bulldozer. AMD Piledriver cores are found in Socket FM2 Trinity and Richland based series of APUs and CPUs and the Socket AM3+ Vishera based FX-series of CPUs.

Steamroller is the AMD codename for its third-generation microarchitecture based on an improved version of Piledriver. Steamroller cores are found in the Socket FM2+ Kaveri based series of APUs and CPUs.

On 12 October 2011, AMD revealed Excavator to be the codename for the fourth-generation Bulldozer core.[44] Excavator will initially be implemented in the 4th Generation A-series Fusion APU line in 2015. Reports indicate this APU will be codenamed Carrizo.[45]

False advertising lawsuit

In November 2015, AMD was sued under the California Consumers Legal Remedies Act and Unfair Competition Law for misrepresenting the specifications of Bulldozer chips. The suit alleged that because each module is exposed to an operating system as two logical CPU cores that, technically, cannot operate independently from each other, AMD had falsely advertised octa-core Bulldozer chips with an implication that they had eight independent cores, but were effectively quad-core chips due to their module count.[46]

See also

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. Lua error in package.lua at line 80: module 'strict' not found.
  10. http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg
  11. Lua error in package.lua at line 80: module 'strict' not found.
  12. Lua error in package.lua at line 80: module 'strict' not found.
  13. Lua error in package.lua at line 80: module 'strict' not found.
  14. Lua error in package.lua at line 80: module 'strict' not found.
  15. Lua error in package.lua at line 80: module 'strict' not found.
  16. Lua error in package.lua at line 80: module 'strict' not found.
  17. 17.0 17.1 17.2 Lua error in package.lua at line 80: module 'strict' not found.
  18. Lua error in package.lua at line 80: module 'strict' not found.
  19. Lua error in package.lua at line 80: module 'strict' not found.
  20. Lua error in package.lua at line 80: module 'strict' not found.
  21. Lua error in package.lua at line 80: module 'strict' not found.
  22. Lua error in package.lua at line 80: module 'strict' not found.
  23. Lua error in package.lua at line 80: module 'strict' not found.
  24. Lua error in package.lua at line 80: module 'strict' not found.
  25. AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards Archived December 10, 2010 at the Wayback Machine
  26. Lua error in package.lua at line 80: module 'strict' not found.
  27. Lua error in package.lua at line 80: module 'strict' not found.
  28. Lua error in package.lua at line 80: module 'strict' not found.
  29. Lua error in package.lua at line 80: module 'strict' not found.
  30. Lua error in package.lua at line 80: module 'strict' not found.
  31. Lua error in package.lua at line 80: module 'strict' not found.
  32. Lua error in package.lua at line 80: module 'strict' not found.
  33. Lua error in package.lua at line 80: module 'strict' not found.
  34. Lua error in package.lua at line 80: module 'strict' not found.
  35. Lua error in package.lua at line 80: module 'strict' not found.
  36. Lua error in package.lua at line 80: module 'strict' not found.
  37. Lua error in package.lua at line 80: module 'strict' not found.
  38. [1]
  39. Lua error in package.lua at line 80: module 'strict' not found.
  40. AMD: next-generation microarchitecture will make up for muted Bulldozer reception (PC Gamer, Oct. 8, 2014)
  41. Lua error in package.lua at line 80: module 'strict' not found.
  42. AMD Bulldozer Speed Record Broken Again at 8.58GHz
  43. Lua error in package.lua at line 80: module 'strict' not found.
  44. Lua error in package.lua at line 80: module 'strict' not found.
  45. New confirmed details on AMD's 2014 APU lineup, Kaveri delayed - VR-Zone
  46. Lua error in package.lua at line 80: module 'strict' not found.

External links

  • Lua error in package.lua at line 80: module 'strict' not found.
  • Lua error in package.lua at line 80: module 'strict' not found.
  • Lua error in package.lua at line 80: module 'strict' not found.
  • Lua error in package.lua at line 80: module 'strict' not found.