Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012.

41
Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012

Transcript of Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012.

  • Slide 1
  • Tpicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitrio da FEI 2012
  • Slide 2
  • 4a. Aula Parte B
  • Slide 3
  • O algoritmo K-means
  • Slide 4
  • K-Means n Algoritmo muito conhecido para agrupamento (clustering) de padres. n Usado quando se pode definir o nmero de agrupamentos: Escolha o nmero de agrupamentos desejado. Escolha centros e membros dos agrupamentos de modo a minimizar o erro. No pode ser feito por busca: muitos parmetros.
  • Slide 5
  • K-Means n Algoritmo: Fixe os centros dos agrupamentos. Aloque os pontos para o agrupamento mais prximo. Recalcule os centros dos clusters, como sendo a mdia dos pontos que ele representa. Repita at que os centros parem de se mover.
  • Slide 6
  • K-Means n Pode ser usado para qualquer atributo para o qual se pode calcular uma distncia
  • Slide 7
  • Clustering n Partitioning Clustering Approach: a typical clustering analysis approach via partitioning data set iteratively construct a partition of a data set to produce several non-empty clusters (usually, the number of clusters given in advance) in principle, partitions achieved via minimising the sum of squared distance in each cluster
  • Slide 8
  • Clustering n Given a K, find a partition of K clusters to optimise the chosen partitioning criterion: global optimal: exhaustively enumerate all partitions Heuristic method: K-means algorithm (MacQueen67): each cluster is represented by the center of the cluster and the algorithm converges to stable centers of clusters.
  • Slide 9
  • Algorithm n Initialisation: set seed points n Assign each object to the cluster with the nearest seed point; n Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster) n Go back to Step 1), n stop when no more new assignment Given the cluster number K, the K-means algorithm is carried out in three steps:
  • Slide 10
  • Example n Suppose we have 4 types of medicines and each has two attributes: pH and weight index. n Our goal is to group these objects into K=2 group of medicine.
  • Slide 11
  • Example AB C D MedicineWeightpH-Index A11 B21 C43 D54
  • Slide 12
  • Step 1: Use initial seed points for partitioning Assign each object to the cluster with the nearest seed point Euclidean distance
  • Slide 13
  • Step 2: Compute new centroids of the current partition Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.
  • Slide 14
  • Step 2: Renew membership based on new centroids 14 Compute the distance of all objects to the new centroids Assign the membership to objects
  • Slide 15
  • Step 3: Repeat the first two steps until its convergence Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.
  • Slide 16
  • Repeat the first two steps until its convergence Compute the distance of all objects to the new centroids Stop due to no new assignment
  • Slide 17
  • K-means Demo 17 1.User set up the number of clusters theyd like. (e.g. k=5)
  • Slide 18
  • K-means Demo 18 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations
  • Slide 19
  • K-means Demo 19 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations 3.Each data point finds out which Center its closest to. (Thus each Center owns a set of data points)
  • Slide 20
  • K-means Demo 20 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each Center owns a set of data points) 4.Each centre finds the centroid of the points it owns
  • Slide 21
  • K-means Demo 21 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there
  • Slide 22
  • K-means Demo 22 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there 6.Repeat until terminated!
  • Slide 23
  • Exemplo K-means no Matlab 23
  • Slide 24
  • Exemplo k-means no iPad 24
  • Slide 25
  • Relevant Issues n Efficient in computation O(tKn), where n is number of objects, K is number of clusters, and t is number of iterations. Normally, K, t