Supercomputadores e a computação híbrida

Post on 27-May-2022

4 views 0 download

Transcript of Supercomputadores e a computação híbrida

Supercomputadoreseacomputaçãohíbrida

HaroldoFragadeCamposVelhoE-mail:haroldo.camposvelho@inpe.brWeb-page:www.lac.inpe.br/~haroldo

§  O que è “super-computador”?

§  Demanda por processamento de alto desempenho

§  Computação híbrida §  Processadores e co-processadores §  Alternativas abandonadas §  CPU + GPU (exemplo com o BRAMS) §  CPU + FPGA (caso em assimilação de dados) §  Futuro do computador/computação

§  Processamento paralelo + Inteligência Computacional §  Cooperação: COPDT-INPE e IEAv-DCTA §  Cooperação: Cátedra de meteorologia – DECEA e UFRJ

Brevedescrição…

2

Haroldo F. de Campos Velho – INPE Pesquisador Sênior / Computação Científica

ÁREAS DE PESQUISA:

•  Problemas Inversos (várias aplicações, melhores metodologieas)

•  Assimilação de Dados

Novo método: baseado em redes neurais artificiais

•  Parametrização de Turbulência Atmosférica

Resultados: Teoria de Taylor em turbulência em nuvens

Modelo de crescimento para comada limite conventiva

Turbulência e dinâmica cosmológica

§  Em termos simples, é um equipamento com grande capacidade de cálculos aritméticos (*).

(*) A capacidade de cálculo é quantificada pelo número de operações (multiplicações) de ponto flutuante (**) por segundo (“flops”). (**) Os computadores trabalham com representação de números inteiros, ponto fixo e ponto flutuante.

Super-computador

4

Colossus (UK - 1943) https://en.wikipedia.org/wiki/Colossus_computer

ENIAC (USA - 1945) Foto: https://en.wikipedia.org/wiki/ENIAC

§  1ª Geração (1945, Eniac): válvula

§  2ª Geração (IBM 1404): transistor

§  3ª Geração (IBM/360 – computadores pessoais): circuito integrado

§  4ª Geração: microprocessadores (VLSI)

§  Processamento vetorial: supercomputadores

Super-computador

5

1976: É o ano do lançamento do Cray-1 (horseshoe-shape design)

Foto: https://en.wikipedia.org/wiki/Cray-1

§  1ª Geração (1945, Eniac): válvula

§  2ª Geração (IBM 1404): transistor

§  3ª Geração (IBM/360 – computadores pessoais): circuito integrado

§  4ª Geração: microprocessadores (VLSI)

§  Processamento vetorial: supercomputadores

Super-computador

6

1976: É o ano do lançamento do Cray-1 (horseshoe-shape design)

1980: processamento distribuído Connection-machine (CM-5)

Foto: https://en.wikipedia.org/wiki/Connection_Machine

Multi-processing machine with distributed memory

CPTEC-INPE: Cray XE6

1280nodes

30720cores

Supercomputermachines:mul;-core

7

CPTEC-INPE:Computersystems

n  UNA(~800processors,1600cores)

Other alternative for hybrid computing

CPU + GPU.

Cluster of computers GPU (Fermi NVIDIA)

Cluster of computers GPU (Firestream AMD)

Demandaporaltodesempenho

10

§  MathemaGcalmodel(hardequaGons:non-linear)

Movement Equation (momentum)

Continuity Equation (mass)

Thermodynamic equation (energy)

01

=∂

∂+−

x

pfv

dt

du

ρ0

1=

∂++

y

pfu

dt

dv

ρ0

1=

ρ++

z

pg

dt

dh

( ) ( ) ( ) 0=ρ∂

∂+ρ

∂+ρ

∂+

ρ∂h

zv

yu

xt

( )ρTfp =( ) ( )

)(cteRg

T

Tf≡=⇒

ρρ ( )Tgp ρ=

RTp ρ=( )

dt

dq

dt

dp

dt

dTCv =+

ρ/1

Porqueum“up-grade”dosistemadecomputação?

SantaCatarina(BR)November/2008

135deaths/78,000homeless

Prec. Ac. 20 a 23/Nov/2008 12Z – 3 days : Eta 20 km

Prec. Ac. 20 a 23/Nov/2008 12Z – 3 days: Eta 5 km

20km

5km

20km

RioandAngradosReis,31/Dec/2009

(predic;onfor24h)5km

Precipitation Total (mm) – Eta 20 km

Precipitation for24 horas – Eta 5 km

CatarinaHurricane:imagesfromspace

Fotos: https://pt.wikipedia.org/wiki/Furacao_Catarina

210km 105km

63km 20km

CatarinaHurricane:differentsresolu;ons(Globalmodel)

CatarinaHurricane:differentsresolu;ons(Globalmodel)

Demandaporaltodesempenhon  CiênciadosDados

Heart disease

Solar physics

Plane waves

Demandaporaltodesempenhon  CiênciadosDados

Techniques :

•  clustering •  data summarization •  detecting anomalies •  analysing changes •  finding dependency networks •  learning classification rules

Computaçãohíbrida

18

Forecasts Scores ECMWF

Dataassimila;on

20

Very important inverse problem

Determining the initial condition

Dataassimila;on

21

Very important inverse problem

Determining the initial condition

§  Temperature: ANN assimilation experiment

LETKFneuralnetwork

True

Results from Rosangela Cintra PhD thesis (2011)

Data assimilation: an essential issue

Numericalexperiment:LEKFandANN

AtmosphericgeneralmodelcirculaGon(spectralmodel):3DSPEEDY(SimplifiedParameterizaGonsprimiGvEEquaGonDYnamics)

Gaussiangrid:96x48(horizontal)x7levels(verGcal)=T30L7Totalgridpoints:32,256Totalvariablesinthemodel:133,632ObservaGons:(00,06,12,18UTC)–radiosonders“OMMstaGons”ObservaGons:12035(00and12UTC)=415x4x7+415ObservaGons:2075(06and18UTC)=415x5(onlysurface)

LETKFmethod ANNmethod

04:20:39 00:02:53

Execu;on;me

hours:minutes:seconds

Data assimilation by NN: hardware components

FPGA

GPU vs FPGA:

1.  GPU: have more investiments from computer community.

2.  FPGA: Axel Project (system CPU + GPU + FPGA), N-body simulation, the FPGA is faster than GPU.

3.  FPGA uses 2.7 up to 293 less energy than GPU.

2010

2010

Hybrid computing with FPGA

Perceptron-NN for the Cray XD1

Shallow water 2D for ocean circulation

Process Time(μs)

Sodware 121709FPGA(Total) 209187

Shallow water 2D for ocean circulation

Process Time(μs)

Sodware 121709CPUtoFPGA 181365

FPGA 2FPGAtoCPU 9455FPGA(Total) 209187

Shallow water 2D for ocean circulation

3 Nodes FPGA (2014)

3 Nodes FPGA (2015)

4 Nodes ARM (2015)

8 Nodes 2013 (1, 2, …, 7)

4 Nodes HP (storage)

5 Nodes ARM (2014)

Nodes (3) FPGA (2014): 2 proc. Intel 12-cores 1 GPU K20 1Xeon Phi 60-cores 1 FPGA Virtex-7

Nodes (8) (2013): 2 proc. Intel 10-cores 2 GPU K20 FPGA Virtex-6

Nodes (3) FPGA (2015): 2 proc. Intel 12-cores 1 GPU K80 1 Xeon Phi (Knights) 60-core 1 FPGA Virtex-7

Nodes ARM (2014): 6 AppliedMicro 8-core (Calxeda: we can’t buy)

Nodes ARM (2015): 8 Cavium ThunderX 48-cores

Cluster LACibrido

Cooperação COPDT-INPE + IEAv-DCTA

§  Haroldo F. de Campos Velho (*) INPE: National Institute for Space Research – Brazil E-mail: haroldo.camposvelho@inpe.br

§  Elcio H. Shiguemori DCTA: Department of Aerospace Science and Technology – Brazil E-mail: elcio@ieav.cta.br

Navegaçãoautônomadedrones

Drone trajectory correction based on aerial images information

Planned Inertial

navigation

Image correction

33

Computer vision by image segmentation

Dronenaviga;onwithoutGNSSsignal

§  Examples

Original image True Canny

RBF CNN MLP

Available for download: www.epacis.net/jcis/PDF_JCIS/JCIS11-art.01.pdf

MPCA:Mul;-Par;cleCollisionAlgorithm

35

FindinganOPTIMALneuralnetwork

36

§  Supervised neural network: Multi-Layer Perceptron (MLP)

MPCA solution

#hiddenlayers

#neuronslayer-1

#neuronslayer-2

#neuronslayer-3

AcGvaGonfuncGon

MomentumraGo

LearningraGo

Parameters Vallue

Numberofhiddenlayers |1||2||3|

Numberofneuronsforeachlayer |1|...|32|

LearningraGo |0|...|1|

Momentum |0|...|0.9|

AcGvaGonfuncGon |Tanh||Log||Gauss|

UAV positioning algorithm: embedded system

37

1.  Edge patterns

a)  24 are selected

b)  MLP-NN classifies

the other patterns

c)  Classification table

UAV positioning algorithm: embedded system

Pré-processing Neural Network

Satellite image

Drone image

Correlation

UAV estimated position

Image segmentation and correlation

Drone positioning algorithm

39

Georeferenced image

Extracting Edges (NN)

UAV image acquisition

Extracting Edges (NN)

correlation

UAV position

Image from the Google Earth

UAV positioning algorithm: embedded system

40

1.  Drone trajectory correction

Without GNSS signal: edge extraction

UAV positioning algorithm: embedded system

41

1.  Trajectory correction: edge extraction

Edgeextrac;on(methods)

#correctedpointswitherror<10m

CPU-;me(seconds)

Canny 197 0.25Sobel 296 0.29

NN-sodware 414 5.00(20xslower)

UAV positioning algorithm: embedded system

42

1.  Trajectory correction: edge extraction

Edgeextrac;on(methods)

#correctedpointswitherror<10m

CPU-;me(seconds)

Canny 197 0.25Sobel 296 0.29

NN-sodware 414 5.00(20xslower)

UAV positioning algorithm: embedded system

43

1.  Trajectory correction: edge extraction

** Speeding up the processing:

a)  Removing computation: from activation function to LUP

b)   Use hybrid computing: CPU + FPGA

CPU: Rasberry PI B-1 FPGA: Xinlix Spartan 6 LX9

UAV positioning algorithm: embedded system

44

1.  Trajectory correction: edge extraction

Edgeextrac;on(methods)

#correctedpointswitherror<10m

CPU-;me(seconds)

Canny 197 0.25Sobel 296 0.29

NN-sodware 414 5.00(20xslower)NN-FPGA_Raspberry 414 0.58(2xslower)

NN-FPGA_Zybo 414 0.13(2xfaster)

Data fusion: Combining two drone positioning estimations: INS signal + Computer vision Method: Neural Network (emulating Kalman filter

and high precision GPS)

Drone positioning algorithm

45

Data fusion: INS signal + Computer vision

Drone positioning algorithm

46

INS measurements

Positioning 1

Computer vision (CV)

Positioning 2 NN fuser

Drone position

Image aquisition

Data fusion: INS signal + Computer vision

Drone positioning algorithm

47

Positioning 1 Positioning 2 NN fuser

Drone position

Image aquisition

§  Bayesian strategy – filtering

1.  Kalman filter a)  Linear stochastic process b)  Gaussian statistics

2.  Particle filter a)  Applied to non-linear process b)  Applied for non-Gaussian statistical models c)  Kernel of Particle Filter:

Bayesian filters

48

!"!#$!"!#$!"!#$)()()( priori a

1

likelihoodposteriori a

1 )|()|()|(

nwnwnw

nnnnnn YwpwypYwp −− ∝

§  Non-extensive particle filter:

§  Tsallis’s non-extensive distribution

§  Choice for using Gaussian distribution can be justified by the Central Limit theorem.

§  However, there is another attractor on the distribution space.

§  This is the Levy-Gnedenko’s central limit theorem.

§  For the case of Tsallis’ distributions:

Bayesian filters

ρq (x) =G(x) q < 5 3

Lγ (x) q > 5 3

⎧⎨⎪

⎩⎪

§  Likehood function: Tsallis’ distribution (non-extensive formalism of thermodynamics)

§  where “q” is the non-extensive parameter:

to be estimated by a secondary process

Adap;veNon-extensivePar;cleFilter

50

UAV positioning algorithm: embedded system

51

1.  Drone trajectory correction

Data fusion by Non-Extensive Particle Filter (NEx-PF)

UAV positioning algorithm: embedded system

52

1.  Drone trajectory correction

Data fusion by Non-Extensive Particle Filter Uncertainty quantification

Cátedrademeteorologia:DECEAeUFRJ

53

Cátedra de Meteorologia Aeronáutica UFRJ/UNIFA/DECEA

Cátedrademeteorologia:DECEAeUFRJ

54

MembrosdoComitê:ü  GutembergBorgesFrança,PhD.(Presidente)–IGEO-LMA-UFRJü  HaroldoFragadeCamposVelho,DSc.–LAC-INPEü  GilbertoFernandoFisch,DSC.–IAE-COMAERü  FranciscoLeiteAlbuquerqueNeto,DSc.–IGEO-LMA-UFRJü  FranciscoPinheiroGomes–ComandantedaCIMAER

AvaliadoresConvidadosparaVCSCMA:n  NelsonFranciscoF.Ebecken,DSc.–COPPPE-UFRJn  HugoAbiKaramDSc.–IGEO-UFRJn  WallaceF.MenezesDSc.–IGEO-UFRJn  AnaMariaBuenoNunesDSc.–IGEO-UFRJn  JulianaAnochiDSc.–CPTEC-INPEn  AudalioRebeloTorresJuniorDSc.UFMA

Cátedrademeteorologia:DECEAeUFRJ

55

Futurodocomputador

56

Há algumas apostas para o futuro:

a)  Um computador fotônico!

b)  Computação quântica!

57

Muito Obrigado !!