Connectivity Based Clustering

Connectivity Based Clustering

Connectivity Based Clustering

Description

Connectivity Based Clustering builds the clusters based on the notion that the vectors of data points in space exhibit more similarity to each other than the data points lying farther away.

Why to use

To form clusters of textual data.

When to use

When the number of clusters is not known.

When not to use

  • When data is labeled.
  • When the number of clusters is known.
  • When the dataset is very large.

Prerequisites

Input data should be of text type and should not contain special characters and numbers.

Input

Textual Data

Output

Data divided into clusters

Statistical Methods used

  • Linkage Metric
  • Linkage Criterion

Limitations

  • Cannot handle big data well.
  • Does not work well with very large data sets.
  • Does not work with missing data.
  • The time complexity for clustering can result in very long computation times compared to efficient algorithms like k-means.

Connectivity-based clustering is also called hierarchical clustering because it builds clusters in a hierarchy. In clustering, the data points closer to each other exhibit more similarity than those away from each other.

The algorithm starts with assigning data points to a cluster of their own. Then two nearest clusters are merged to form a single cluster. In the end, the algorithm terminates with only one cluster remaining.

There are two approaches to this model. In the first approach, data points are classified into separate clusters and then aggregated as the distance between them decreases.

In the second approach, data points are distributed into a single large cluster and then segregated as the distance between them increases. Rubiscape uses this approach.

    • Related Articles

    • Density Based Clustering

      Density Based Clustering Description It classifies the given set of data by building clusters based on the idea that a cluster in the data space is a continuous region of high point density, separated from other clusters by continuous regions of low ...
    • Centroid Based Clustering

      Centroid Based Clustering Description In Centroid Based Clustering, a central vector represents each cluster. The objects are assigned to the clusters such that the squared distance between the object and the central vector is minimized. Why to use ...
    • Clustering

      Clustering is the process of grouping objects such that objects in the same group (cluster) are more similar to each other compared to those in different groups (clusters). Clustering algorithms try to group similar objects in one cluster and the ...
    • Clutering

      Data clustering (or cluster analysis) is a method of dividing the data points into several groups called clusters. All data points within a cluster are mutually similar as compared to data points belonging to different clusters. Thus, clusters are ...
    • Rubiscape Spring '21

      New Features Studio & Rubisight Algorithms added in RubiText Centroid based clustering Connectivity based clustering Density based clustering Incremental learning Support for background formatting for text widget Support for background formatting for ...