smile.clustering.CentroidClustering<double[],double[]>

smile.clustering.GMeans

All Implemented Interfaces:: Serializable, Comparable<CentroidClustering<double[],double[]>>

public class GMeans extends CentroidClustering<double[],double[]>

G-Means clustering algorithm, an extended K-Means which tries to automatically determine the number of clusters by normality test. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian.

References

G. Hamerly and C. Elkan. Learning the k in k-means. NIPS, 2003.

See Also:

Field Summary

Fields inherited from class smile.clustering.CentroidClustering
centroids, distortion

Fields inherited from class smile.clustering.PartitionClustering
k, OUTLIER, size, y
Constructor Summary

Constructors

Constructor

Description

GMeans(double distortion, double[][] centroids, int[] y)

Constructor.
Method Summary

Modifier and Type

Method

Description

protected double

distance(double[] x, double[] y)

The distance function.

static GMeans

fit(double[][] data, int kmax)

Clustering data with the number of clusters determined by G-Means algorithm automatically.

static GMeans

fit(double[][] data, int kmax, int maxIter, double tol)

Clustering data with the number of clusters determined by G-Means algorithm automatically.

Methods inherited from class smile.clustering.CentroidClustering
compareTo, predict, toString

Methods inherited from class smile.clustering.PartitionClustering
run, seed

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- GMeans
  
  public GMeans(double distortion, double[][] centroids, int[] y)
  
  Constructor.
  
  Parameters:
  
  distortion - the total distortion.
  
  centroids - the centroids of each cluster.
  
  y - the cluster labels.
Method Details
- distance
  
  protected double distance(double[] x, double[] y)
  
  Description copied from class: CentroidClustering
  
  The distance function.
  
  Specified by:
  
  distance in class CentroidClustering<double[],double[]>
  
  Parameters:
  
  x - an observation.
  
  y - the other observation.
  
  Returns:
  
  the distance.
- fit
  
  public static GMeans fit(double[][] data, int kmax)
  
  Clustering data with the number of clusters determined by G-Means algorithm automatically.
  
  Parameters:
  
  data - the input data of which each row is an observation.
  
  kmax - the maximum number of clusters.
  
  Returns:
  
  the model.
- fit
  
  public static GMeans fit(double[][] data, int kmax, int maxIter, double tol)
  
  Clustering data with the number of clusters determined by G-Means algorithm automatically.
  
  Parameters:
  
  data - the input data of which each row is an observation.
  
  kmax - the maximum number of clusters.
  
  maxIter - the maximum number of iterations for k-means.
  
  tol - the tolerance of k-means convergence test.
  
  Returns:
  
  the model.

Class GMeans

References

Field Summary

Fields inherited from class smile.clustering.CentroidClustering

Fields inherited from class smile.clustering.PartitionClustering

Constructor Summary

Method Summary

Methods inherited from class smile.clustering.CentroidClustering

Methods inherited from class smile.clustering.PartitionClustering

Methods inherited from class java.lang.Object

Constructor Details

GMeans

Method Details

distance

fit

fit