Centroid Based Clustering

Centroid Based Clustering

Centroid Based Clustering

Description

In Centroid Based Clustering, a central vector represents each cluster. The objects are assigned to the clusters such that the squared distance between the object and the central vector is minimized.

Why to use

To convert textual data to its numerical form.

When to use

  • When the number of clusters is known.
  • When each cluster size is expected to be of equal size.

When not to use

  • When data is labeled.
  • When the number of clusters is not known.
  • When clusters are not to be of equal size.

Prerequisites

Input data should be of text type and should not contain special characters and numbers.

Input

Textual Data

Output

Data divided into clusters

Statistical Methods used

  • K-means
  • Random Initialization

Limitations

  • The number of clusters needs to be known.
  • Not very robust to outliers.
  • Does not work very well with non-convex shapes.
  • Tries to generate equal-sized clusters.

In Centroid-based clustering, each cluster is represented by a central vector. The central vector may not necessarily be a part of the dataset. A data value is assigned to a cluster depending upon its proximity, such that its squared distance from the central vector is minimized.

The k-means algorithm is the most widely used centroid-based clustering algorithm. In this algorithm, the dataset is divided into k pre-defined, distinct, and non-overlapping clusters. Each data point is assigned to a cluster such that the arithmetic means of all data points within a cluster is always minimum. Minimum variation within a cluster ensures greater homogeneity of data points within that cluster.

    • Related Articles

    • Connectivity Based Clustering

      Connectivity Based Clustering Description Connectivity Based Clustering builds the clusters based on the notion that the vectors of data points in space exhibit more similarity to each other than the data points lying farther away. Why to use To form ...
    • Density Based Clustering

      Density Based Clustering Description It classifies the given set of data by building clusters based on the idea that a cluster in the data space is a continuous region of high point density, separated from other clusters by continuous regions of low ...
    • Clustering

      Clustering is the process of grouping objects such that objects in the same group (cluster) are more similar to each other compared to those in different groups (clusters). Clustering algorithms try to group similar objects in one cluster and the ...
    • Clutering

      Data clustering (or cluster analysis) is a method of dividing the data points into several groups called clusters. All data points within a cluster are mutually similar as compared to data points belonging to different clusters. Thus, clusters are ...
    • Rubiscape Spring '21

      New Features Studio & Rubisight Algorithms added in RubiText Centroid based clustering Connectivity based clustering Density based clustering Incremental learning Support for background formatting for text widget Support for background formatting for ...