Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World .

Uploaded on:
Category: Art / Culture
Multi-Center Processor Innovation: Expanding CPU Execution in a Force Obliged World. Paul Teich Business Technique CPG Server/Workstation AMD. paul.teich @ . The Issues. Silicon fashioners can pick an assortment of strategies to expand processor execution
Slide 1

Multi-Core Processor Technology: Maximizing CPU Performance in a Power-Constrained World Paul Teich Business Strategy CPG Server/Workstation AMD paul.teich @

Slide 2

The Issues Silicon planners can pick an assortment of techniques to build processor execution Commercial end-clients are requesting More able frameworks with more skilled processors That new frameworks remain inside their current power/warm foundation Processor recurrence and power utilization appear to scale in lockstep How can the business standard PC and Server enterprises remain on our noteworthy execution bend without consuming an opening in our motherboards? This session is not about process innovation

Slide 3

Session Outline Definition: What is a processor? Center Design System Architecture Manufacturing, Power, and Thermals Multi-Core Processor Architecture Performance Impacts

Slide 4

What is a Processor? A solitary chip bundle that fits in an attachment ≥1 center (very little point in <1 center… ) Cores can have utilitarian units, store, and so forth connected with them, similarly as today Cores can be quick or moderate, similarly as today Shared assets More reserve Other joining: Northbridge, memory controllers, fast serial connections, and so on. One framework interface regardless of what number of centers Number of flag pins doesn\'t scale with number of centers

Slide 5

A Representative Multi-Core Processor Dual-center AMD Opteron™ processor is 199mm 2 in 90nm Single-center AMD Opteron processor is 193mm 2 in 130nm

Slide 6

Multi-Core Processor Architecture

Slide 7

Core Design L1 Icache 64KB Scan/Align/Decode Microcode Engine Fastpath L1 Dcache 64KB FP Decode & Rename 36-passage FP scheduler Frequency Is just comparable to whatever is left of the center engineering Fetch Branch Prediction AMD Opteron processor center engineering µops Instruction Control Unit (72 sections) Int Decode & Rename 44-section Load/Store Queue Res AGU FADD FMUL FMISC ALU MULT

Slide 8

Core Design Functional units Superscalar is known domain Diminishing returns for including more utilitarian pieces Alternatives like VLIW have been considered and dismisses by the market Single-strung building execution is pegged Data ways Increasing transfer speed between useful units in a center has any kind of effect Such as complete 64-bit outline, however then where to?

Slide 9

Core Design Pipeline Deeper pipeline purchases recurrence at cost of expanded reserve miss punishment and lower guidelines per clock Shallow pipeline gives better directions per time to the detriment of recurrence scaling Max recurrence per center requires further pipelines Industry merging on center ground… 9 to 11 phases Successful RISC CPUs are in a similar range Cache estimate purchases execution at cost of pass on size, it\'s an immediate hit to assembling cost Deep pipeline store miss punishments are lessened by bigger stores Not generally the best match for shallow pipeline centers, as store misses punishments are not as steep

Slide 10

Manufacturing Moore\'s Law isn\'t dead, more transistors for everybody! Be that as it may, it doesn\'t generally specify scaling transistor control Chemistry and material science at nano-scale Stretching materials science Voltage doesn\'t scale yet Transistor spillage current is expanding As assembling economies and recurrence increment, control utilization is expanding lopsidedly There are no procedure or compositional speedy fixes

Slide 11

Transistors Are Not Free The quantity of transistors in a center decides fundamental power utilization Architectural productivity matters a considerable measure when outlining new centers More useful units implies more transistors Deeper pipelines mean more transistors Larger stores mean more transistors

Slide 12

Static Current versus Recurrence Very High Leakage and Power Embedded Parts Non-direct as processors approach max recurrence 15 Static Current Fast, High Power Fast, Low Power 0 Frequency 1.0 1.5

Slide 13

Power versus Recurrence In AMD\'s procedure, for 200MHz recurrence steps, two stages back on recurrence cuts control utilization by ~40% from most extreme recurrence (Gross relative numbers compressed from a pile of genuine information)

Slide 14

Thermal Density Decreases Hot spots Twice the same number of as in single-center Farther separated than in single-center With freq delta, cooler than in single-center Θ CA same for single-center at n and double center at n-2 Larger bite the dust spreads warm more equally in bundle Use indistinguishable warmth sink, somewhat better cooling with double center Works for this processor era and next, Θ CA changes over real eras Thermal diode precision turns into an issue with double center

Slide 15

Total Effect on Dual-Core Frequencies Substantially bring down power with lower recurrence Thermals less demanding to deal with at any recurrence Result is double center running at n-2 in same warm envelope as single-center running at top speed

Slide 16

Multi-Core Processor Architecture Why incorporate? Most capacities are tiny contrasted with the centers and reserve All coordinated rationale keeps running at center recurrence paying little mind to I/O speeds What to incorporate? Northbridge crossbar switch is key Look for development and separation in how centers are associated on-chip Must incorporate Northbridge to coordinate something else… Memory controller to diminish memory inertness and further decrease the requirement for store High-speed serial connections for framework I/O What not to coordinate? Most Southbridge capacities Graphics

Slide 17

AMD Opteron Processor Integrated Northbridge CPU 0 Data CPU 1 Data CPU 0 Probes CPU 1 Probes CPU 0 Requests CPU 1 Requests CPU 0 Int CPU 1 Int System Request Interface (SRI) Advanced Programmable Interrupt Controller (APIC) 64-bit Data Crossbar (XBAR) Memory Controller (MCT) DRAM Controller (DCT) 64-bit Command/Address 16-bit Data/Command/Address HyperTransport™ Link 0 HyperTransport Link 1 HyperTransport Link 2 RAS/CAS/Cntl DRAM Data

Slide 18

Multi-Core: Where Processor and System Collide Scales execution Dedicated assets for two concurrent strings Multiple centers will battle for memory and I/O transfer speed Northbridge is the bottleneck Integrating Northbridge wipes out quite a bit of bottleneck Northbridge engineering has critical effect on execution Cores, store and Northbridge must be adjusted for ideal execution More total execution for: Multi-strung applications Transactions: many occasions of same application Multi-entrusting Thread booking dealt with by OS BIOS advises Windows of string execution assets

Slide 19

Early Benchmark Estimates Decoder 2P/2C – 2 proc. single-center 4P/4C – 4 proc. single-center 2P/4C – 2 proc. double center 4P/8C – 4 proc. double center Frequencies Single-center = 2.4GHz Dual-center = 2.0GHz Identical framework configs Memory, plates, organize, and so forth. Early double center approval framework utilized, distinctive motherboards SPEC and the benchmark name SPECint are enrolled trademarks of the Standard Performance Evaluation Corporation. SPEC scores for AMD Opteron Model 270 and 870 based frameworks are evaluated

Slide 20

Call to Action Most application programming doesn\'t have to effectively profit by double center Be mindful that, for a processor inside a given influence envelope Fewer centers will clock quicker than more centers Single-strung execution delicate applications More centers will out-perform less centers for Multi-strung applications Multi-entrusting reaction times Transaction preparing Processor engineering impacts multi-center execution Process innovation is just the bet Integration empowers an adjusted superior design

Slide 21

Community Resources Windows Hardware & Driver Central (WHDC) Technical Communities Non-Microsoft Community Sites Microsoft Public Newsgroups Technical Chats and Webcasts Microsoft Blogs

Slide 22

Additional Resources Email: paul.teich @ WinHEC Presentations "x86 Everywhere," Chris Herring, AMD "Amplifying Desktop Application Performance on Dual-Core PC Platforms," Rich Brunner, AMD Web Resources AMD Multi-Core Opteron™ Processor Multi-Core White Paper HyperTransport™ Consortium

View more...