Supercomputadores e a computação híbrida
Transcript of Supercomputadores e a computação híbrida
Supercomputadoreseacomputaçãohíbrida
HaroldoFragadeCamposVelhoE-mail:[email protected]:www.lac.inpe.br/~haroldo
§ O que è “super-computador”?
§ Demanda por processamento de alto desempenho
§ Computação híbrida § Processadores e co-processadores § Alternativas abandonadas § CPU + GPU (exemplo com o BRAMS) § CPU + FPGA (caso em assimilação de dados) § Futuro do computador/computação
§ Processamento paralelo + Inteligência Computacional § Cooperação: COPDT-INPE e IEAv-DCTA § Cooperação: Cátedra de meteorologia – DECEA e UFRJ
Brevedescrição…
2
Haroldo F. de Campos Velho – INPE Pesquisador Sênior / Computação Científica
ÁREAS DE PESQUISA:
• Problemas Inversos (várias aplicações, melhores metodologieas)
• Assimilação de Dados
Novo método: baseado em redes neurais artificiais
• Parametrização de Turbulência Atmosférica
Resultados: Teoria de Taylor em turbulência em nuvens
Modelo de crescimento para comada limite conventiva
Turbulência e dinâmica cosmológica
§ Em termos simples, é um equipamento com grande capacidade de cálculos aritméticos (*).
(*) A capacidade de cálculo é quantificada pelo número de operações (multiplicações) de ponto flutuante (**) por segundo (“flops”). (**) Os computadores trabalham com representação de números inteiros, ponto fixo e ponto flutuante.
Super-computador
4
Colossus (UK - 1943) https://en.wikipedia.org/wiki/Colossus_computer
ENIAC (USA - 1945) Foto: https://en.wikipedia.org/wiki/ENIAC
§ 1ª Geração (1945, Eniac): válvula
§ 2ª Geração (IBM 1404): transistor
§ 3ª Geração (IBM/360 – computadores pessoais): circuito integrado
§ 4ª Geração: microprocessadores (VLSI)
§ Processamento vetorial: supercomputadores
Super-computador
5
1976: É o ano do lançamento do Cray-1 (horseshoe-shape design)
Foto: https://en.wikipedia.org/wiki/Cray-1
§ 1ª Geração (1945, Eniac): válvula
§ 2ª Geração (IBM 1404): transistor
§ 3ª Geração (IBM/360 – computadores pessoais): circuito integrado
§ 4ª Geração: microprocessadores (VLSI)
§ Processamento vetorial: supercomputadores
Super-computador
6
1976: É o ano do lançamento do Cray-1 (horseshoe-shape design)
1980: processamento distribuído Connection-machine (CM-5)
Foto: https://en.wikipedia.org/wiki/Connection_Machine
Multi-processing machine with distributed memory
CPTEC-INPE: Cray XE6
1280nodes
30720cores
Supercomputermachines:mul;-core
7
CPTEC-INPE:Computersystems
n UNA(~800processors,1600cores)
Other alternative for hybrid computing
CPU + GPU.
Cluster of computers GPU (Fermi NVIDIA)
Cluster of computers GPU (Firestream AMD)
Demandaporaltodesempenho
10
§ MathemaGcalmodel(hardequaGons:non-linear)
Movement Equation (momentum)
Continuity Equation (mass)
Thermodynamic equation (energy)
01
=∂
∂+−
x
pfv
dt
du
ρ0
1=
∂
∂++
y
pfu
dt
dv
ρ0
1=
∂
∂
ρ++
z
pg
dt
dh
( ) ( ) ( ) 0=ρ∂
∂+ρ
∂
∂+ρ
∂
∂+
∂
ρ∂h
zv
yu
xt
( )ρTfp =( ) ( )
)(cteRg
T
Tf≡=⇒
ρρ ( )Tgp ρ=
RTp ρ=( )
dt
dq
dt
dp
dt
dTCv =+
ρ/1
Porqueum“up-grade”dosistemadecomputação?
SantaCatarina(BR)November/2008
135deaths/78,000homeless
Prec. Ac. 20 a 23/Nov/2008 12Z – 3 days : Eta 20 km
Prec. Ac. 20 a 23/Nov/2008 12Z – 3 days: Eta 5 km
20km
5km
20km
RioandAngradosReis,31/Dec/2009
(predic;onfor24h)5km
Precipitation Total (mm) – Eta 20 km
Precipitation for24 horas – Eta 5 km
CatarinaHurricane:imagesfromspace
Fotos: https://pt.wikipedia.org/wiki/Furacao_Catarina
210km 105km
63km 20km
CatarinaHurricane:differentsresolu;ons(Globalmodel)
CatarinaHurricane:differentsresolu;ons(Globalmodel)
Demandaporaltodesempenhon CiênciadosDados
Heart disease
Solar physics
Plane waves
Demandaporaltodesempenhon CiênciadosDados
Techniques :
• clustering • data summarization • detecting anomalies • analysing changes • finding dependency networks • learning classification rules
Computaçãohíbrida
18
Forecasts Scores ECMWF
Dataassimila;on
20
Very important inverse problem
Determining the initial condition
Dataassimila;on
21
Very important inverse problem
Determining the initial condition
§ Temperature: ANN assimilation experiment
LETKFneuralnetwork
True
Results from Rosangela Cintra PhD thesis (2011)
Data assimilation: an essential issue
Numericalexperiment:LEKFandANN
AtmosphericgeneralmodelcirculaGon(spectralmodel):3DSPEEDY(SimplifiedParameterizaGonsprimiGvEEquaGonDYnamics)
Gaussiangrid:96x48(horizontal)x7levels(verGcal)=T30L7Totalgridpoints:32,256Totalvariablesinthemodel:133,632ObservaGons:(00,06,12,18UTC)–radiosonders“OMMstaGons”ObservaGons:12035(00and12UTC)=415x4x7+415ObservaGons:2075(06and18UTC)=415x5(onlysurface)
LETKFmethod ANNmethod
04:20:39 00:02:53
Execu;on;me
hours:minutes:seconds
Data assimilation by NN: hardware components
FPGA
GPU vs FPGA:
1. GPU: have more investiments from computer community.
2. FPGA: Axel Project (system CPU + GPU + FPGA), N-body simulation, the FPGA is faster than GPU.
3. FPGA uses 2.7 up to 293 less energy than GPU.
2010
2010
Hybrid computing with FPGA
Perceptron-NN for the Cray XD1
Shallow water 2D for ocean circulation
Process Time(μs)
Sodware 121709FPGA(Total) 209187
Shallow water 2D for ocean circulation
Process Time(μs)
Sodware 121709CPUtoFPGA 181365
FPGA 2FPGAtoCPU 9455FPGA(Total) 209187
Shallow water 2D for ocean circulation
3 Nodes FPGA (2014)
3 Nodes FPGA (2015)
4 Nodes ARM (2015)
8 Nodes 2013 (1, 2, …, 7)
4 Nodes HP (storage)
5 Nodes ARM (2014)
Nodes (3) FPGA (2014): 2 proc. Intel 12-cores 1 GPU K20 1Xeon Phi 60-cores 1 FPGA Virtex-7
Nodes (8) (2013): 2 proc. Intel 10-cores 2 GPU K20 FPGA Virtex-6
Nodes (3) FPGA (2015): 2 proc. Intel 12-cores 1 GPU K80 1 Xeon Phi (Knights) 60-core 1 FPGA Virtex-7
Nodes ARM (2014): 6 AppliedMicro 8-core (Calxeda: we can’t buy)
Nodes ARM (2015): 8 Cavium ThunderX 48-cores
Cluster LACibrido
Cooperação COPDT-INPE + IEAv-DCTA
§ Haroldo F. de Campos Velho (*) INPE: National Institute for Space Research – Brazil E-mail: [email protected]
§ Elcio H. Shiguemori DCTA: Department of Aerospace Science and Technology – Brazil E-mail: [email protected]
Navegaçãoautônomadedrones
Drone trajectory correction based on aerial images information
Planned Inertial
navigation
Image correction
33
Computer vision by image segmentation
Dronenaviga;onwithoutGNSSsignal
§ Examples
Original image True Canny
RBF CNN MLP
Available for download: www.epacis.net/jcis/PDF_JCIS/JCIS11-art.01.pdf
MPCA:Mul;-Par;cleCollisionAlgorithm
35
FindinganOPTIMALneuralnetwork
36
§ Supervised neural network: Multi-Layer Perceptron (MLP)
MPCA solution
#hiddenlayers
#neuronslayer-1
#neuronslayer-2
#neuronslayer-3
AcGvaGonfuncGon
MomentumraGo
LearningraGo
Parameters Vallue
Numberofhiddenlayers |1||2||3|
Numberofneuronsforeachlayer |1|...|32|
LearningraGo |0|...|1|
Momentum |0|...|0.9|
AcGvaGonfuncGon |Tanh||Log||Gauss|
UAV positioning algorithm: embedded system
37
1. Edge patterns
a) 24 are selected
b) MLP-NN classifies
the other patterns
c) Classification table
UAV positioning algorithm: embedded system
Pré-processing Neural Network
Satellite image
Drone image
Correlation
UAV estimated position
Image segmentation and correlation
Drone positioning algorithm
39
Georeferenced image
Extracting Edges (NN)
UAV image acquisition
Extracting Edges (NN)
correlation
UAV position
Image from the Google Earth
UAV positioning algorithm: embedded system
40
1. Drone trajectory correction
Without GNSS signal: edge extraction
UAV positioning algorithm: embedded system
41
1. Trajectory correction: edge extraction
Edgeextrac;on(methods)
#correctedpointswitherror<10m
CPU-;me(seconds)
Canny 197 0.25Sobel 296 0.29
NN-sodware 414 5.00(20xslower)
UAV positioning algorithm: embedded system
42
1. Trajectory correction: edge extraction
Edgeextrac;on(methods)
#correctedpointswitherror<10m
CPU-;me(seconds)
Canny 197 0.25Sobel 296 0.29
NN-sodware 414 5.00(20xslower)
UAV positioning algorithm: embedded system
43
1. Trajectory correction: edge extraction
** Speeding up the processing:
a) Removing computation: from activation function to LUP
b) Use hybrid computing: CPU + FPGA
CPU: Rasberry PI B-1 FPGA: Xinlix Spartan 6 LX9
UAV positioning algorithm: embedded system
44
1. Trajectory correction: edge extraction
Edgeextrac;on(methods)
#correctedpointswitherror<10m
CPU-;me(seconds)
Canny 197 0.25Sobel 296 0.29
NN-sodware 414 5.00(20xslower)NN-FPGA_Raspberry 414 0.58(2xslower)
NN-FPGA_Zybo 414 0.13(2xfaster)
Data fusion: Combining two drone positioning estimations: INS signal + Computer vision Method: Neural Network (emulating Kalman filter
and high precision GPS)
Drone positioning algorithm
45
Data fusion: INS signal + Computer vision
Drone positioning algorithm
46
INS measurements
Positioning 1
Computer vision (CV)
Positioning 2 NN fuser
Drone position
Image aquisition
Data fusion: INS signal + Computer vision
Drone positioning algorithm
47
Positioning 1 Positioning 2 NN fuser
Drone position
Image aquisition
§ Bayesian strategy – filtering
1. Kalman filter a) Linear stochastic process b) Gaussian statistics
2. Particle filter a) Applied to non-linear process b) Applied for non-Gaussian statistical models c) Kernel of Particle Filter:
Bayesian filters
48
!"!#$!"!#$!"!#$)()()( priori a
1
likelihoodposteriori a
1 )|()|()|(
nwnwnw
nnnnnn YwpwypYwp −− ∝
§ Non-extensive particle filter:
§ Tsallis’s non-extensive distribution
§ Choice for using Gaussian distribution can be justified by the Central Limit theorem.
§ However, there is another attractor on the distribution space.
§ This is the Levy-Gnedenko’s central limit theorem.
§ For the case of Tsallis’ distributions:
Bayesian filters
ρq (x) =G(x) q < 5 3
Lγ (x) q > 5 3
⎧⎨⎪
⎩⎪
§ Likehood function: Tsallis’ distribution (non-extensive formalism of thermodynamics)
§ where “q” is the non-extensive parameter:
to be estimated by a secondary process
Adap;veNon-extensivePar;cleFilter
50
UAV positioning algorithm: embedded system
51
1. Drone trajectory correction
Data fusion by Non-Extensive Particle Filter (NEx-PF)
UAV positioning algorithm: embedded system
52
1. Drone trajectory correction
Data fusion by Non-Extensive Particle Filter Uncertainty quantification
Cátedrademeteorologia:DECEAeUFRJ
53
Cátedra de Meteorologia Aeronáutica UFRJ/UNIFA/DECEA
Cátedrademeteorologia:DECEAeUFRJ
54
MembrosdoComitê:ü GutembergBorgesFrança,PhD.(Presidente)–IGEO-LMA-UFRJü HaroldoFragadeCamposVelho,DSc.–LAC-INPEü GilbertoFernandoFisch,DSC.–IAE-COMAERü FranciscoLeiteAlbuquerqueNeto,DSc.–IGEO-LMA-UFRJü FranciscoPinheiroGomes–ComandantedaCIMAER
AvaliadoresConvidadosparaVCSCMA:n NelsonFranciscoF.Ebecken,DSc.–COPPPE-UFRJn HugoAbiKaramDSc.–IGEO-UFRJn WallaceF.MenezesDSc.–IGEO-UFRJn AnaMariaBuenoNunesDSc.–IGEO-UFRJn JulianaAnochiDSc.–CPTEC-INPEn AudalioRebeloTorresJuniorDSc.UFMA
Cátedrademeteorologia:DECEAeUFRJ
55
Futurodocomputador
56
Há algumas apostas para o futuro:
a) Um computador fotônico!
b) Computação quântica!
57
Muito Obrigado !!