Class BIRCH
- All Implemented Interfaces:
Serializable,VectorQuantizer
BIRCH has several advantages. For example, each clustering decision is made without scanning all data points and currently existing clusters. It exploits the observation that data space is not usually uniformly occupied and not every data point is equally important. It makes full use of available memory to derive the finest possible sub-clusters while minimizing I/O costs. It is also an incremental method that does not require the whole data set in advance.
This implementation produces a clustering in three steps. First step builds a CF (clustering feature) tree by a single scan of database. The second step clusters the leaves of CF tree by hierarchical clustering. Then the user can use the learned model to classify input data in the final step. In total, we scan the database twice.
References
- Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD, 1996.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionfinal intThe branching factor of non-leaf nodes.final intThe dimensionality of data.final intThe number of CF entries in the leaf nodes.final doubleTHe maximum radius of a sub-cluster.Fields inherited from interface smile.vq.VectorQuantizer
OUTLIER -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
B
public final int BThe branching factor of non-leaf nodes. -
L
public final int LThe number of CF entries in the leaf nodes. -
T
public final double TTHe maximum radius of a sub-cluster. -
d
public final int dThe dimensionality of data.
-
-
Constructor Details
-
BIRCH
public BIRCH(int d, int B, int L, double T) Constructor.- Parameters:
d- the dimensionality of data.B- the branching factor of non-leaf nodes, i.e. the maximum number of children nodes.L- the number entries in the leaf nodes.T- the maximum radius of a sub-cluster.
-
-
Method Details
-
update
public void update(double[] x) Description copied from interface:VectorQuantizerUpdate the codebook with a new observation.- Specified by:
updatein interfaceVectorQuantizer- Parameters:
x- a new observation.
-
quantize
public double[] quantize(double[] x) Description copied from interface:VectorQuantizerQuantize a new observation. Returns Optional.empty if the observation is noise.- Specified by:
quantizein interfaceVectorQuantizer- Parameters:
x- a new observation.- Returns:
- the quantized vector.
-
centroids
public double[][] centroids()Returns the cluster centroids of leaf nodes.- Returns:
- the cluster centroids of leaf nodes.
-