Centroid Based Clustering

Centroid Based Clustering
Description	In Centroid Based Clustering, a central vector represents each cluster. The objects are assigned to the clusters such that the squared distance between the object and the central vector is minimized.
Why to use	To convert textual data to its numerical form.
When to use	When the number of clusters is known. When each cluster size is expected to be of equal size.	When not to use	When data is labeled. When the number of clusters is not known. When clusters are not to be of equal size.
Prerequisites	Input data should be of text type and should not contain special characters and numbers.
Input	Textual Data	Output	Data divided into clusters
Statistical Methods used	K-means Random Initialization	Limitations	The number of clusters needs to be known. Not very robust to outliers. Does not work very well with non-convex shapes. Tries to generate equal-sized clusters.

In Centroid-based clustering, each cluster is represented by a central vector. The central vector may not necessarily be a part of the dataset. A data value is assigned to a cluster depending upon its proximity, such that its squared distance from the central vector is minimized.

The k-means algorithm is the most widely used centroid-based clustering algorithm. In this algorithm, the dataset is divided into k pre-defined, distinct, and non-overlapping clusters. Each data point is assigned to a cluster such that the arithmetic means of all data points within a cluster is always minimum. Minimum variation within a cluster ensures greater homogeneity of data points within that cluster.

Related Articles
Connectivity Based Clustering
Connectivity Based Clustering Description Connectivity Based Clustering builds the clusters based on the notion that the vectors of data points in space exhibit more similarity to each other than the data points lying farther away. Why to use To form ...
Density Based Clustering
Density Based Clustering Description It classifies the given set of data by building clusters based on the idea that a cluster in the data space is a continuous region of high point density, separated from other clusters by continuous regions of low ...
Clustering
Clustering is the process of grouping objects such that objects in the same group (cluster) are more similar to each other compared to those in different groups (clusters). Clustering algorithms try to group similar objects in one cluster and the ...
Clutering
Data clustering (or cluster analysis) is a method of dividing the data points into several groups called clusters. All data points within a cluster are mutually similar as compared to data points belonging to different clusters. Thus, clusters are ...
Rubiscape Spring '21
New Features Studio & Rubisight Algorithms added in RubiText Centroid based clustering Connectivity based clustering Density based clustering Incremental learning Support for background formatting for text widget Support for background formatting for ...

Centroid Based Clustering

Centroid Based Clustering

Related Articles

Connectivity Based Clustering

Density Based Clustering

Clustering

Clutering

Rubiscape Spring '21