Package smile.clustering
Class GMeans
java.lang.Object
smile.clustering.PartitionClustering
smile.clustering.CentroidClustering<double[],double[]>
smile.clustering.GMeans
- All Implemented Interfaces:
Serializable,Comparable<CentroidClustering<double[],double[]>>
G-Means clustering algorithm, an extended K-Means which tries to
automatically determine the number of clusters by normality test.
The G-means algorithm is based on a statistical test for the hypothesis
that a subset of data follows a Gaussian distribution. G-means runs
k-means with increasing k in a hierarchical fashion until the test accepts
the hypothesis that the data assigned to each k-means center are Gaussian.
References
- G. Hamerly and C. Elkan. Learning the k in k-means. NIPS, 2003.
- See Also:
-
Field Summary
Fields inherited from class smile.clustering.CentroidClustering
centroids, distortionFields inherited from class smile.clustering.PartitionClustering
k, OUTLIER, size, y -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected doubledistance(double[] x, double[] y) The distance function.static GMeansfit(double[][] data, int kmax) Clustering data with the number of clusters determined by G-Means algorithm automatically.static GMeansfit(double[][] data, int kmax, int maxIter, double tol) Clustering data with the number of clusters determined by G-Means algorithm automatically.Methods inherited from class smile.clustering.CentroidClustering
compareTo, predict, toStringMethods inherited from class smile.clustering.PartitionClustering
run, seed
-
Constructor Details
-
GMeans
public GMeans(double distortion, double[][] centroids, int[] y) Constructor.- Parameters:
distortion- the total distortion.centroids- the centroids of each cluster.y- the cluster labels.
-
-
Method Details
-
distance
protected double distance(double[] x, double[] y) Description copied from class:CentroidClusteringThe distance function.- Specified by:
distancein classCentroidClustering<double[],double[]> - Parameters:
x- an observation.y- the other observation.- Returns:
- the distance.
-
fit
Clustering data with the number of clusters determined by G-Means algorithm automatically.- Parameters:
data- the input data of which each row is an observation.kmax- the maximum number of clusters.- Returns:
- the model.
-
fit
Clustering data with the number of clusters determined by G-Means algorithm automatically.- Parameters:
data- the input data of which each row is an observation.kmax- the maximum number of clusters.maxIter- the maximum number of iterations for k-means.tol- the tolerance of k-means convergence test.- Returns:
- the model.
-