Inferência Filogenética
-
Upload
leo-vaughan -
Category
Documents
-
view
62 -
download
0
description
Transcript of Inferência Filogenética
Inferência FilogenéticaInferência Filogenética
Construção de Árvores Filogenéticas II
Ana Margarida Sousa
Instituto Gulbenkian de CiênciaGrupo de Biologia Evolutiva
AA
BB
CC
DD
EE
FF
GG
wtwt
Árvore verdadeira
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
BglIIBglII
BclI
BclI
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
A
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
BclI
BclI
Sau3A
I
Sau3A
I
BglIIBglII
BclI
BclI
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
B
CSau
3AI
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclIBglIIBglII
BclI
BclI
Sau3A
I
Sau3A
I
BglIIBglII
BclI
BclI
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BamHI
BamHI
Sau3A
I
Sau3A
I
BclI
BclI
DSau
3AI
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
BclI
BclI
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BamHI
BamHI
Sau3A
I
Sau3A
I
BamHI
BamHI
BamHI
BamHI
Sau3A
I
Sau3A
I
BclI
BclI
ESau
3AI
Sau3A
IBcl
IBcl
IBcl
IBcl
I
Sau3A
I
Sau3A
I
BclI
BclI
BclI
BclI
Sau3A
I
Sau3A
I
BglIIBglII
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BamHI
BamHI
BclI
BclI
FBcl
IBcl
I
Sau3A
I
Sau3A
I
BclI
BclIBcl
IBcl
IBcl
IBcl
IBcl
IBcl
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
IBcl
IBcl
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
GBcl
IBcl
I
Sau3A
I
Sau3A
I
BclI
BclI
BglIIBglII
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
Sau3A
I
BclI
BclI
Dados I - DADOS DE RESTRIÇÃO
A
B
C
D
E
F
G
7 53 1A 11011100100000111010000001110110101000001110000001110B 11111010100100111010000001110100101000101110000001010C 00011000110011111010000001101100101000001100000011010D 00011000100000011010000011110111101110001100000011010E 10000001101001111111110100100100111000111101010110100F 10000110101000011110111111100110101001111101101001011G 10010000100000111110010100100100101000011101001001010
7 553A GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGGGAACTCATGATCAAGGGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGATTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGATAGAATTACAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCB GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGAGAACTCATGATCAAGAGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGTTGCCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGGTTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCC GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGTTGGAAGCGTTTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGGGAATTCATGATCAAGGGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGATCTACGATATGGTAGAATTATAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCD GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTTTACTGAACGATGACCGTTGCTTCTACAAAGATGGCTTTATGCTTGATGGGGAACTCATGATCAAGGGCGTAGACTTTAACACAGGGTCCGGCCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGTCTCTGCTACAGGAATACTTCCCTGAAATCAAATGGCAAGCGACTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAGAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCE GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGTTTCTACAAAGATGGCTTTATGCTTGATGGGGAACTCATGATCAAGGACGTAGATTTTAACACAGGGTCCGACCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTACTACAGGAATACTTTCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAACAAAAGCGAGCAGAAGGCCATGAGGGTTTCATTGTGAAAGACCCF GGAACATCTGCGTAGACAATACTGCTAACAGTTATTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGTTTCTACAAAGATGGCTTTATGCTTGATGGGGAATTCATGATCAAGGGCGTAGATTTTAACACAGGGTCCGACCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTACTACAGGAATATTTTCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAAAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCCG GGAACATCTGCGTAGACAATACTGCTAACAGTTACTGGCTCTCTCGTGTATCTAAAACGATTCCGGCACTGGAACACTTAAACGGGTTTGATGTTCGCTGGAAGCGTCTACTGAACGATGACCGTTGTTTCTACAAAGATGGCTTTATGCTTGATGGGGAATTCATGATCAAGGGCGTAGATTTTAACACAGGGTCCGACCTACTGCGTACTAAATGGACTGACACGAAGAACCAAGAGTTTCATGAAGAGTTATTCGTTGAACCAATCCGTAAGAAAGATAAAGTTCCCTTTAAGCTGCACACTGGACACCTTCACATAAAACTGTACGCTATCCTCCCGCTGCACATCGTGGAGTCTGAAGAAGACTGTGATGTCATGACGTTGCTCATGCAGGAACACGTTAAGAACATGCTGCCTCTACTACAGGAATACTTTCCTGAAATCAAATGGCAAGCGGCTGAATCTTACGAGGTCTACGATATGGTAGAATTACAGCAATTGTACGAGCAAAAGCGAGCAGAAGGCCATGAGGGTCTCATTGTGAAAGACCC
Dados II - SEQUÊNCIAS NUCLEOTÍDICAS
Mapas físicos (dados de restrição)
Matriz de dados (0/1)Matriz de dados (0/1)
Matriz de distâncias
UPGMA NJ ME
Seqs nucleotídicas Seqs nucleotídicas (alinhamento)(alinhamento)
MP, ML
Matriz de distâncias
UPGMA NJ MEMP, ML
Dados boleanos (0/1)
Métodos Programa
Cálculo distâncias restdistUPGMA neighborNJ neighborME fitchMP parsML restml
Dados de sequência
Métodos Programa
Cálculo distâncias dnadistUPGMA neighborNJ neighborME fitchMP dnaparsML dnaml
BOOTSTRAPBOOTSTRAP
Matriz de dados (0/1)Matriz de dados (0/1)
Gerar 100 pseudo-réplicas
100 Matrizes de distância
100 árvores NJ
Árvore consenso pela maioriaÁrvore consenso pela maioria
Seqs nucleotídicas Seqs nucleotídicas (alinhamento)(alinhamento)
Gerar 100 pseudo-réplicas
100 árvores NJ
Árvore consenso pela maioriaÁrvore consenso pela maioria
Métodos Programa
Pseudo-replicas seqbootÁrvore consenso consens
1. Copiar o ficheiro de entrada (formato .txt) para a pasta onde se encontra o programa executável que vai utilizar (ex: restdist.exe).
2. Clicar duas vezes sobre o executável para abrir o programa.
3. Escrever o nome do ficheiro de entrada (não esquecer a extensão “.txt”).
4. Alterar as opções pretendidas conforme indicado no menu.
5. Escrever ‘y’. Automaticamente é gerado um ‘outfile’ e/ou um ‘treefile’.
6. Transferir estes ficheiros para outra pasta e mudar-lhes o nome.
7. Abrir o ficheiro ‘treefile’ com o programa TreeView para analisar a árvore produzida.
Sequência de passos para utilizar qualquer um dos programas do pack PhylipPhylip.
Inferência Bayesiana usando o programa MrBayesMrbayes.exe
Dados mistos: Dados de restrição + Dados de sequênciaDados mistos: Dados de restrição + Dados de sequência
#NEXUSbegin data;dimensions ntax=14 nchar=5128;format datatype=mixed (Restriction:1-304,DNA:305-5128) interleave=yes gap=- missing=?;matrix
A0000100010100?01001000000000001000100001000001011000000000000001010010000000000010000011100000000110000000000000001000010100000100000100000000001000000000000000000000000001010001000000010001010000000001000010100000000100010000000001000001000101000010000010001000001000100000000010100101010000010100000100
B0100010000000?0?001101001000001110000100100001001000000000000001011110000010110000000000101000010001100000010000010000001000000010000101000100000100001100000010001000011001000000100100000001100000000000000001100010000101100000001001010101000000000001000100000100000010100001000000100001000000010011101000
C 0010010000000?0?001101101000001110000100000100001000000000000001111010000010111000011000101000010001000000000000010000001000000010000101000100000100001100000010000000000111000001100100000001100000000000010001100010000101100000000001000101000000000001000100000100000010100001000100100001000000010000101000
B11P10 TAAAAATCTGAGTGACTATCTCACAGTGTACGGAC-CTAAAGTTCCCCCAB13P10 TAAAAATCTGAGTGATTATCTCACAGTGTACGGAC-CTAAAGTTCCCCCAB14P10 TAAAAATCTGAGTGATTATCTCACAGTGTACGGAC-CTAAAGTTCCCCCA[ 4810 4820 ][ * * ]a5P10 TAGGGGGTACCTAAAGCCCAGCCAa7P10 TAGGGGGTACCTAAAGCCCAGCCAa8P10 TAGGGGGTACCTAAAGCCCAGCCAa9P10 TAGGGGGTACCTAAAACCCAGCCAa11P10 TAGGGGGTACCTAAAGCCCAGCCAa13P10 TAGGGGGTACCTAAAGCCCAGCCAa14P10 TAGGGGGTACCTAAAGCCCAGCCAB3P10 TAGGGGGTACCTAAAGCCCAGCCAB7P10 TAGGGGGTACCTAAAGCCCAGCCAB9P10 TAGGGGGTACCTAAAGCCCAGCCAB10P10 TAGGGGGTACCTAAAGCCCAGCCAB11P10 TAGGGGGTACCTAAAACCCAGTCAB13P10 TAGGGGGTACCTAAAACCCAGTCAB14P10 TAGGGGGTACCTAAAACCCAGTCA;end;begin mrbayes;
delete 1 4 6 7 12 13 14;charset Restriction=1-304;charset DNA=305-5128;partition Names=2: Restriction, DNA;set partition=Names;lset applyto=(2) nst=6 rates=gamma;unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all);prset ratepr=variable;
mcmcp ngen=1000 printfreq=100 samplefreq=100 nchains=4 savebrlens=yes filename=Allenz+Allseqs0; mcmc; end;
Sequência de passos para utilizar o programa MrBayesMrbayes.exe
1. Gravar o ficheiro de entrada na mesma localização que o programa executável.
2. Iniciar o programa.
3. Escrever o comando ‘execute’ e depois o nome do ficheiro de entrada (não esquecer a extensão ‘.txt’).
4. Aumentar o número de gerações para 1 000 000.
5. Verificar se ao fim deste nº de gerações o valor do desvio padrão entre as cadeias é ≤ 0.01.
6. Se sim pode parar o programa.
7. Escrever o comando ‘sump burnin = 2500’ (resumir os valores dos parâmetros).
8. Escrever o comando ‘sumt burnin = 2500’.
9. Verificar o resultado abrindo o ficheiro com extensão ‘.con’ com o programa TreeView.