tdc2012

74
Um Case de Arquitetura Distribuída para Indexação, Armazenamento e Análise de Logs em Tempo Real Juan Lopes

Transcript of tdc2012

Page 1: tdc2012

Um Case de Arquitetura Distribuída para Indexação, Armazenamento e

Análise de Logs em Tempo Real

Juan Lopes

Page 2: tdc2012

COMPLEX EVENT PROCESSING

Page 3: tdc2012

TIME SERIESREAL-TIME

Page 4: tdc2012

LOGS

Page 5: tdc2012
Page 6: tdc2012
Page 7: tdc2012

CENTENAS DE SERVIDORES

Page 8: tdc2012

marvin@goldenheart ~ $ ssh root@deepthought****WELCOME TO 1 OF YOUR 38,157,987 SERVERS. TRY THE VEAL. IT'S THE BEST IN THIS FARM.****

root@deepthought ~ $ tail -f /var/log.txt

COMO ACESSAR OS LOGS?

Page 9: tdc2012

COMO "DEBUGAR"?

Page 10: tdc2012
Page 11: tdc2012

CENTRALIZARINDEXAR

Page 12: tdc2012
Page 13: tdc2012

3TB / DIA

Page 14: tdc2012

3TB / DIA10.000.000.000 MSGS / DIA

36 MB / SEGUNDO

Page 15: tdc2012

TWITTER400.000.000 MSGS / DIAEM JUNHO/2012

Page 16: tdc2012

LOGGLYAmplamente utilizadoPrimeira opção para cloudMaior plano não-custom: 12GB/diaPreço: $1,779/mês

Page 17: tdc2012

GRAYLOG2Open SourceSelf-hostedArquitetura de partes móveisMongoDBElasticSearchAMQP

Page 18: tdc2012

SPLUNKFamoso na área de BigDataDestinado ao mundo EnterpriseMuitos gráficos e relatórios$6,000 one-time fee: 500MB/dia500MB < 3TB :(

Page 19: tdc2012
Page 20: tdc2012

JAVA

Page 21: tdc2012

HOTSPOT

Page 22: tdc2012

java.util.concurrent

Page 23: tdc2012

VISÃO GERAL

Indexar

Armazenar

Interpretarmensagens

Page 24: tdc2012

RFC 3164: SYSLOG

<34>Oct 11 22:14:15 mymachine su: 'su root' failed for lonvick on /dev/pts/8

<priority = facility*8+severity><date/time><host><process><message>

Page 25: tdc2012

CHAVE: VALOR

<34>Oct 11 22:14:15 mymachine su: 'su root' failed for lonvick on /dev/pts/8

message

facility AUTH

severity CRITICAL

host mymachine

process su

date 20121011

time 221415

text su, root, failed, for, lonvick, on, /dev/pts/8

Page 26: tdc2012

?

Page 27: tdc2012

MG4JEgothor

NutchOxyus

BDDBotZilverline

YaCyCompass

LiusRegain

PiscatorHounderHSearch

Page 28: tdc2012

<FIELD:CONTENT, DOC*>

TEXT:ABACAXI ➜ 1, 3, 9TEXT:BANANA ➜ 2, 3, 10, 42TEXT:CAJU ➜ 3, 11, 50

Page 29: tdc2012

BAIXA ENTROPIA

Page 30: tdc2012

<10% de termos únicosmenor overhead por mensagem

MESSAGE BAG

Page 31: tdc2012

Bufferizar

Indexar

Armazenar

Interpretar

VISÃO GERAL

Page 32: tdc2012

<DOC, FREQ, POSITION*>

1, 4 ➜ 5, 6, 10, 203, 1 ➜ 409, 4 ➜ 6, 7, 8, 9

Page 33: tdc2012

SCORES NÃO IMPORTAM

Page 34: tdc2012

INDEXAÇÃOField

DocumentIndexWriter

BUSCAQueryParserQueryIndexSearcher

NORMAL

Page 35: tdc2012

INDEXAÇÃOTokenStream

DocumentIndexWriter

BUSCATermPositionsFieldCacheIndexReader

HARDCORE

Page 36: tdc2012

CULPA DA WIDESCREEN

CULPA DA WIDESCREEN

Page 37: tdc2012

Jersey (REST API)Backbone.jsCometD

WEB INTERFACE

Page 38: tdc2012

Jersey (REST API)Backbone.jsCometD

WEB INTERFACE

engine browser

"app:apache http 404"?

"OK. listen: /comet/1234568790abcdef"

Page 39: tdc2012

CULPA DA WIDESCREEN

CULPA DA WIDESCREEN

Page 40: tdc2012

Cara e coragemHttpClientCometD

COMMAND-LINE INTERFACE

/intelie/lognit-cli

Page 41: tdc2012

REALTIME (AKA TAIL -F)

EVENTS

subscriber

Page 42: tdc2012

LIGHTWEIGHT TERM TRIE

ABRAÇOABRIGOCHOCOLATE

<RAIZ>

ABR

AÇO IGO

CHOCOLATE

Page 43: tdc2012

AGREGAÇÃO (AKA WC -L)

EVENTS

Page 44: tdc2012

http

~10.000 eventos / segundo

Page 45: tdc2012

http => count()

1 evento / segundo

Page 46: tdc2012

http => count()by host

~100 eventos / segundo

Page 47: tdc2012

http => count()by host

every 30 seconds

~100 eventos / 30 segundos

Page 48: tdc2012

http => avg(cputime#)by host

every 30 seconds

~100 eventos / 30 segundos

Page 49: tdc2012

CULPA DA WIDESCREEN

CULPA DA WIDESCREEN

Page 50: tdc2012

É PRECISO

ESCALAR

Page 51: tdc2012

taxa de leituraMODERADA

taxa de escritaALTÍSSIMA

dependência entre os dados

BAIXA

Page 52: tdc2012

SHARDING

LoadBalancer

engine

engine

engine

UDP/TCP 514

Page 53: tdc2012

Cluster

engine

engineengine

Page 54: tdc2012

Cluster

engine

engineengine

Page 55: tdc2012

Cluster

engine

engineengine

HTTP

usuário

WebServer

Broker

Page 56: tdc2012

Cluster

engine

engineengine

HTTP

usuário

WebServer

Page 57: tdc2012

Cluster

HTTP

usuário

Multicast

engine

engine

engine

Page 58: tdc2012
Page 59: tdc2012

MULTICAST

JChannel channel = new JChannel();channel.setReceiver(new ReceiverAdapter() { public void receive(Message msg) { System.out.println( msg.getSrc() + ": " + msg.getObject()); }});

channel.connect("meuCanalDeChat");

BufferedReader reader = new BufferedReader( new InputStreamReader(System.in));while(true) { String line = reader.readLine(); channel.send(null, line);}

Page 60: tdc2012

STACK CONFIGURÁVEL

Page 61: tdc2012

TUDO ESTÁ DISTRIBUÍDO

Page 62: tdc2012

last 10 "http_status:404"

10

10

10

10

usuário

engine

engine

engine

mergesort, take 10

BUSCA

Page 63: tdc2012

engine

engine

engine

last 10 "http_status:404"

before {id:84324814}

10

10

10

10

usuário

mergesort, take 10

BUSCA

Page 64: tdc2012

http 200 => count() by host

host count

foo 1234

bar 2345

baz 3456

AGREGAÇÃO

Page 65: tdc2012

count() + count() + count()

engine engine engine

AGREGAÇÃO

Page 66: tdc2012

http 200 => avg(time) by host

host avg_time

foo 0.888889

bar 0.224568

baz 5.623424

AGREGAÇÃO

Page 67: tdc2012

avg(time) + avg(time) + avg(time)

?

engine engine engine

AGREGAÇÃO

Page 68: tdc2012

sum(time) + sum(time) + sum(time)count(time) + count(time) + count(time)

AGREGAÇÃO

engine engine engine

Page 69: tdc2012

sum(time) + sum(time) + sum(time)count(time) + count(time) + count(time)

AGREGAÇÃO

engine engine engine

Page 70: tdc2012

E EM PRODUÇÃO?

Page 71: tdc2012

3.5BI DE MENSAGENS

1TB DE DADOS ORIGINAIS

180GB DE ÍNDICE

3 SERVIDORES (LOAD < 0.2)

Page 72: tdc2012

UMA ÚLTIMA COISA

Page 73: tdc2012

1700+ TESTS

99% LINESCOVERED

Page 74: tdc2012

OBRIGADO!

/juanplopes

@juanplopes

intelie.com.br