Amd future of gp us - campus party
-
Upload
campus-party-brasil -
Category
Documents
-
view
1.271 -
download
2
Transcript of Amd future of gp us - campus party
Apresentações AMD
19 de janeiro 10:00 – 12:00 – O Futuro das GPUs
20 de janeiro 10:00 – 12:00 – Computação acelerada
Roberto Brandão
AMD Latin America
The Future of GPURoberto Brandão
AMD Latin America
Today’s GPUs focused on
GAMING
ENTERTAINMENT
PRODUCTIVITY
Today’s GPUs focused on
GAMING
ENTERTAINMENT
PRODUCTIVITY
DirectX® 11 Tessellation
Images courtesy of Unigine Corp.
No Tessellation Tessellation
DirectX® 10 DirectX® 11
5
DirectX® 11 Multi-Threading
Application, DirectX runtime, and DirectX driver can each run in separate threads
Tasks like loading a texture or compiling a shader can execute in parallel with main rendering thread
DirectX® 10 DirectX® 11
6
DirectX® 11 Tessellation
Images courtesy of Unigine Corp.
No Tessellation Tessellation
DirectX® 10 DirectX® 11
7
DirectX® 11 Tessellation
Images courtesy of Unigine Corp.
No Tessellation Tessellation
DirectX® 10 DirectX® 11
8
Order Independent Transparency (OIT)
Efficient rendering of many overlapping transparent objects
Smoke, fire, hair, foliage, fences, water, glass
Rendering transparent objects correctly requires sorting
Blending is an order dependent operation
DirectCompute 11 simplifies OIT by sorting transparent pixels in one shader pass
Uses atomic operations and append buffers
9
DirectX® 11 OIT in Action
Order-Independent Transparency
Simple Alpha Blending
Skeletonexposed
Arm bleedsthrough body
10
Render Post-Processing
Apply filter kernel to every pixel in rendered image Depth of field, motion blur, tone mapping, edge detection, smoothing, sharpening
Requires data from neighbouring pixels
Example: constant time filter spreading Accurately simulates certain lens
effects such as depth of field
Novel processing techniquedeveloped at AMD in conjunctionwith UC Berkeley
DirectCompute greatly simplifiesimplementation while increasingperformance and visual fidelity
– Alpha buffer tricks no longer needed –fewer artifacts
– Shared memory optimizations –better performance
11
DirectX® 11 Depth of Field in Action
Filter Spreading
Legacy Method
Noticeable halosHard silhouette
12
Shadow Rendering
HDAO (High Definition Ambient Occlusion)
Detects “valleys” in scene geometry and darkens them according to depth
Contact hardened shadows
Sharpens shadow edges where they contact casting object, make edges increasingly blurry as they get farther away
13
DirectX® 11 Shadows in Action
DirectX 10.1 Shadows
Images from S.T.A.L.K.E.R.: Call of Prypiat (GSC Gameworld)
DirectX 11Contact Hardened Shadows
14
Lighting Post Effects
Realistic Nighttime lighting
HDR bloom
Lens flare
Atmospheric scattering
Light trails
3D color grading
Motion blur
Anti-Aliasing
Smoothes jagged edges around objects
More obvious in moving images and at lower resolutions
Takes multiple samples of image
More samples = higher quality, but also much more work
Radeon products support 2x, 4x, and 6x sample modes
No AA
2x AA
4x AA
6x AA
EQAA Modes
= Color Sample Location
= Coverage Sample Location
= Pixel Boundary
2x MSAA
2x EQAA4 coverage samples
4x MSAA
4x EQAA8 coverage samples
8x MSAA
8x EQAA16 coverage samples
No AA
Tessellating the Right Way
Can add significant detail to a scene while effectively compressing geometry
But excessive use of tessellation can be inefficient for today’s GPUs Poor utilization of rasterizers Overshading Too many polygon edges for MSAA
Brute force approach is wasteful
8
7
6
5
4
3
2
1 25 pixel triangles
15 pixel triangles
5 pixel triangles
1 pixel triangles
Overshade per pixel
16 pixel triangle100% rasterizer
utilization
1 pixel triangle6.25% rasterizer
utilization
Morphological Anti-Aliasing
Post-process filtering technique accelerated with DirectCompute
Delivers full-scene anti-aliasing
Not limited to polygon edges, alpha-tested surfaces, etc.
Faster than super-sampling
Performance similar to edge-detect CFAA, but applies to all edges
Compatible with any DirectX® 9/10/11 application
Including games with no AA support
Enabled via AMD Catalyst Control Center™
No AA Morphological AA
Images captured from Aliens vs. Predator by Rebellion
Morphological Anti-Aliasing
No AA 4xMSAA MLAA MSAA + MLAA
Today’s GPUs focused on
GAMING
ENTERTAINMENT
PRODUCTIVITY
Power savings improvment
We are visual beings
23
Consumers are looking for better visual experience in an evironment with variable content
Content formats and sources have more diversity than ever
New applications will demand for computing power that is impossible on today’s hardware
Words are processedat only 150 words
per minute
Verbal perception
Pictures and video are processed 400 to
2000 times faster
Visual perception
Enhanced Multimedia Capabilities
Windows® Aero ModePlayback of HD videos in high qualitywith Windows® Aero mode enabled10
Enhanced UVD2Hardware acceleration decodeof dual 1080p HD video streams9
Video GammaIndependent from Windows® desktopfor a superior user experience
Brighter Whites“Blue Stretch” processing increases theblue value of white colors for brighter videos
Dynamic Video RangeControl of levels of black and white during playback
Power ManagementEnables new customers for all levels of graphics
24
Superior HDMI Audio and Video Features
Enhanced Home Theatre Audio Experience HDMI 1.3a Dolby TrueHD & DTS-HD Master Audio Full support for premium Blu-ray audio formats
Dolby TrueHD , DTS-HD Master Audio, AC-3 and DTS
High quality surround soundUp to 8 channels of 192kHz / 24-bit audio
Advanced Display Quality
HDMI 1.3a Deep Color & x.v.Color Over 1 billion colors output through HDMI
12-bpc output, 10-bpc (4:4:4) meaningfully derived11
Wide range of colorsFull support for wide-gamut x.v. color video signals
25
Improvements already reached consumers
0%
10%
20%
30%
40%
50%
60%
70%
80%
Processor utilization
ATI Stream
Adobe Flash plugin used by Youtube.com Better image quality and video smoothness Lower processor usage
Convert your DVD videos into near HD quality with DVD Upscaling
Designed to help dramatically
improve the quality of your movies
Take Your DVD’s to Near HD Quality
Better video quality from a DVD (DVD Upscaling)
Better definition and sharpness of video streams based on MPEG-2 (DVD) for high definition displays
DVD Upscaled DVD
Dramatically Improve Online Video Quality
Watch online videos with smooth playback and sharper, vibrant image quality
Make online video come to life!
Today’s GPUs focused on
GAMING
ENTERTAINMENT
PRODUCTIVITY
Introducing Next-Gen Desktop Configurations for …
DCC
CAD
Image courtesy of StudioGPU
Image courtesy Todd Daniele
Driver version 8.66 (ATI Catalyst™ 9.10) or above is required to support ATI Eyefinity technology. To enable a third display requires one panel with a DisplayPort connector.
Introducing Next-Gen Desktop Configurations for …
Oil & Gas
Medical
Image courtesy Barco Medical SystemsDriver version 8.66 (ATI Catalyst™ 9.10) or above is required to support ATI Eyefinity technology. To enable a third display requires one panel with a DisplayPort connector.
6x1 Portrait Display Group
3x1 Landscape Display Group
3x1 Landscape Display Group
Plus 3 Extended
3x1 Display Group Plus 1 Extended
1x3 Portrait Display Group
Maximum Flexibility in Display Configuration*
Screen Images courtesy Todd Daniele
Screen Image courtesy Todd Daniele
Screen Images courtesy Todd Daniele
Screen Images courtesy University of Hertforshire
Image courtesy University of Hertfordshire
Single GPU 4K Output for CAD and DCC*
Image courtesy University of Hertfordshire, D.Atkins
Image courtesy Todd Daniele *Planned features, specifications, and/or capabilities of top sku of upcoming ATI FirePro™ professional graphics cards. Subject to change without notice.
Distinctive Features - High Quality Rendering
Full 30-bit display pipeline produces more than one billion colors and enables you to see more of your data*
Up to 1600 Stream Processors enable you to push visual effects farther than ever before
Images courtesy Barco Medical Systems
Images courtesy Studio GPU
* Requires 30-bit monitor for true 30-bit color display.
AMD Support for 30-bit Color* in Adobe® Photoshop®
8-bit per color component16.7 million colors**
10-bit per color componentOver 1 billion colors**
* Requires 30-bit monitor for true 30-bit color display. **Simulated images.
AMD Stream TechnologyUsing the GPU to Enhance the Notebook PC Experience
Gaming Entertainment Productivity
Developers leverage AMD GPUs and CPUsfor enhanced application performance and user experience
Industry-standard OpenCL™and DirectCompute 11 enablecross-platform development
Massively parallel, programmable GPU architecture enables dramatic performance and power efficiency
Balanced Platform
Open Standards
Performance and Battery Life
* ATI Stream technology requires both enabled graphics and an enabled application
37
ATI Stream-Enabled Applications & Games
MediaShow 5MediaShow EspressoPowerDirector 8PowerDirector 7
SimHD™ Plug-infor TotalMedia Theatre
Roxio Creator™ 2010Roxio Creator™ 2010 Pro
Aliens vs, PredatorSTALKER Call of PripyatDiRT 2
38
Using fourCPU Cores
Frames Frames
CPU Usage: 100%
GPU Usage: 1%
Video Transcoding SampleNo GPU Acceleration
CPU Usage: 100% Time to finish: 1h 52m Total Power: 0.23kW/h
GPU Usage: 1% Peak power: 145W Energy Price: $0.1539
Frames Frames
CPU Usage: 45%
GPU Usage: 35%
Video Transcoding SampleATI GPU Acceleration
CPU Usage: 45% (100%) Time to finish: 26m (1h52m) Total Power: 0.11kW/h (0.23)
GPU Usage: 35% (1%) Peak power: 198W (145W) Energy Price: $0.07 ($0.15)
Using hundreds ofStream Processors
ControlControl
40
CONECTIVITY
Get immersed with AMD Eyefinity
technology
Get amazing Eye-Definition graphics
with DirectX® 11
Get fast applications and incredible video with
AMD EyeSpeed technology
2 x miniDPDesigned for Displayport 1.2
HDMI 1.4a 2 x DVI (DL-DVI+ SL-DVI)
AMD Radeon™ HD 6000 Series Graphics
POWER MANAGEMENT
Performance por watt
• US datacenters consume more power than five 1000 megawatt nuclear power plants – at a cost of almost $3 billion
• This is 150% more than the consumption in 2001
Power savings improvment
AMD PowerTune Technology
Clamps GPU TDP to a pre-determined level
Integrated control processor monitors GPU activity real time
GPU includes counters across all blocks which are monitored and applied to an algorithm to infer power draw
Dynamically adjusts clock to enforce TDP
Provides direct control over GPU power draw (as opposed to indirect via clock/voltage tweaks)
Algorithmic approach guarantees consistent performance across each product variant
No longer need to constrain default clock speeds to allow for outlier applications
User controllable via AMD OverDrive Utility
PowerTune – Game Power Draw
Games consistently operate at lower power than peak apps
With PowerTune, each product variant is tuned to maximize game performance
Outlier applications are still handled gracefully
Accommodates future application power draw
Lost Planet DX10
Crysis DX10
Resident Evil 5 DX10
Battle-forge
DX10.1
Furmark 1.65
3DMark 03 GT4
Perlin Noise
OCCT SC8100
110
120
130
140
150
160
170
180
190
200
Max Total ASIC Power (W)
AMD PowerTune Technology
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31300
400
500
600
700
800
900
0
50
100
150
200
250
300
AMD Radeon™ HD 69503DMark Vantage: Perlin Noise
GPU Core Clock [MHz] FPS
Time Sec(s)
GPU
Core
Clo
ck (
MH
z)
FPS
THE FUTURE OF GPUs
The future of GPUs
More performance
Better power management
GPU Everywhere
One Design, Fewer Watts, Massive Capability
Discrete-level DirectX® 11
GPU
“Zacate” AMD
Fusion APU
75 sq. mm 18 watts
NorthbridgeDual-Core
CPU+ + =
66 sq. mm 13 watts
117 sq. mm 25 watts
59 sq. mm 8 watts
Graphics and Media Processing Efficiency Improvements
CPU Cores
GPU UVD
SB Functions
~7 GB/sec
~17 GB/sec
UNB
MC
~17 GB/sec
DDR3 DIMMMemory
CPU Chip
PCIe
Bandwidth pinch points and latency hold back the GPU capabilities
3X bandwidth between GPU and memory
Even the same sized GPU is substantially more effective in this configuration
Eliminate latency and power associated with the extra chip crossing
Substantially smaller physical foot print
Graphics requires memory bandwidth
to bring full capabilities to life
~27 GB/sec
~27 GB/sec
DDR3 DIMMMemory
APU Chip
PCIe
2010 IGP-based Platform 2011 APU-based Platform
GPU
CPU Cores
UVD
UN
B / M
C
“Ontario” & “Zacate” Architecture
APU>2 x86 CPU Cores (40nm “Bobcat” core – 1 MB L2,
64-bit FPU)>C6 and power gating>Array of SIMD Engines
• DX11 graphics performance• Industry leading 3D and graphics processing
>3rd Generation Unified Video Decoder>H.264, VC1, DixX/Xvid format
>DDR3 800-1066, 2 DIMMs, 64 bit channel>BGA package
Display and I/O>Two dedicated digital display interfaces
• Configurable externally as HDMI, DVI, and/or Display Port
• Also supports a single link LVDS for internal panels
>Integrated VGA>5x8 PCIe® > “Hudson” Fusion Controller Hub
Summary
More realistic graphics
Perfect power management and energy efficiency
Used by all kind of applications
55
Everywhere