Reformulating the WRF Model for Design Processors.

Uploaded on:
Reformulating the WRF Model for Design Processors By John Ciolek Nearby scale NWP on a $5K PC? sixteenth Meeting of the DMCC Video Gaming Industry Assessed size of the gaming business 2005: $31.3 Billion 2006: $36.1 Billion 2007: $42.8 Billion Pattern toward more reasonable pictures
Slide 1

Reformulating the WRF Model for Graphics Processors By John Ciolek Local-scale NWP on a $5K PC? sixteenth Meeting of the DMCC

Slide 2

May 4, 2009

Slide 3

Video Gaming Industry Estimated size of the gaming business 2005: $31.3 Billion 2006: $36.1 Billion 2007: $42.8 Billion Trend toward more practical pictures Requires all the more capable rendering equipment Created hazardous development in design processors May 4, 2009

Slide 4

Graphics Cards Meant to connect to standard PC transport Control rendering of pixels, voxels , aspects, and so forth. Controlled by the focal handling unit (CPU) Contain numerous processors Graphics Processing Unit (GPU) (like CPU) Stream preparing Input set of information (stream) Kernel works on the stream Performs one or more operations May 4, 2009

Slide 5

GPUs Maximize number of processors Minimize store and control structures May 4, 2009

Slide 6

Memory Access Relies on restricted memory Slower access to principle framework memory Note how strings are composed: Grids Blocks Threads May 4, 2009

Slide 7

Programmer Accessibility Vendors made Application Programming Interfaces (APIs) Programmers can get to GPU’s abilities Graphics card programming dialects Vendor particular CUDA, Brook, Cell Generic OpenCL GPUs increased more software engineer usefulness BLAS, FFT, PhysX May 4, 2009

Slide 8

Explosive Growth in GPU Cores and Performance May 4, 2009

Slide 9

Price/Performance Explosion NVIDIA Tesla 960 Cores Playstation 3 Cluster - 8 PS3s Earth Simulator 5120 procs Blue Gene/L 65,536 procs TeraFLOPS/$Million Roadrunner 19,440 procs Cray 1 proc ASCI Red 4,510 procs Cray Y-MP 8 procs May 4, 2009

Slide 10

Current GPU Cost Examples May 4, 2009

Slide 11

Serious Experimenters 23.2 TeraFLOPS ! Running Folding@home 6,240 gushing processors 13 GTX 295 representation cards 14 CPU centers Cost ~ $15,000 May 4, 2009

Slide 12

Serious Science Astrophysics Electrodynamics Life sciences Nanotechnology recreations Computational liquid elements Finance Chemistry Molecular elements Etc. Might 4, 2009

Slide 13

The WRF Connection John Michalakes (NCAR) Formulating & streamlining WRF Group dealing with reformulating WRF for GPUs Mostly for CUDA on NVIDIA cards Claim: “Most late execution changes originated from CPU speed increases” No recoding was obliged This won\'t keep on being the situation May 4, 2009

Slide 14

What’s the Catch? Need to distinguish sections of code that can be reformulated for stream handling Recode those fragments Recompile & join (with streamline switches) Must oversee memory access Machine particular Need to utilize constrained direction set CUDA permits upward conveyability on NVIDIA gadgets May 4, 2009

Slide 15

WRF Reformulation Process Identify target WRF bundles Benchmark execution of current coding Identify snappy change activities Using CUDA compiler switches CUDA characteristic capacities FORTRAN to C transformation Rewrite code Rethink how to actualize calculations Will take the most time Revalidate May 4, 2009

Slide 16

Early Successes Early take a shot at microphysics piece 0.4% of code 25% of slipped by time Results: 5 to 20 x increment for this bit Translates to 1.25 to 1.3 x general change Limited by Amdahl’s Law Based on straightforward modify Did not endeavor CUDA advancements May 4, 2009

Slide 17

Microphysics Kernel Improvements Compiler switch: use_fast_math Eliminated makeshift exhibit stockpiling Graph depends on late results (March 2009) May 4, 2009

Slide 18

Other Key Findings Need to: Reduce exchanges between recollections Maximize number of strings effectively running Enhance fine-grained parallelism Supports “strong-scaling” N times more strings ~ N times better execution Explore equipment particular advancement Work is proceeding on WRF revise Next WRF discharge will have GPU switch Need extra assistance from group May 4, 2009

Slide 19

Target WRF Kernels Single Moment 5 Cloud Microphysics fifth Order Positive Definite Tracer Advection KPP-created Chemical-energy Solver Long-wave Radiation Physics Short-wave Radiation Physics May 4, 2009

Slide 20

Quote: “I wouldn’t suggest gatherings go out and purchase GPU bunches only yet (to run WRF), however perhaps before the end of the year…” John Michalakes May 4, 2009

Slide 21

The Beginning… John Ciolek May

View more...