Density Based Clustering

Density Based Clustering

Density Based Clustering

Description

It classifies the given set of data by building clusters based on the idea that a cluster in the data space is a continuous region of high point density, separated from other clusters by continuous regions of low density.

Why to use

It works well to separate data areas with a high density of observations from data areas that are not very dense with observation. DBSCAN can sort data into clusters of arbitrary shapes as well.

When to use

  • When the number of clusters is not known.
  • When there is too much noise in data.
  • To separate data points of high density from data points of low density.

When not to use

When the number of clusters is known.

Prerequisites

Input data should be of text type and should not contain special characters and numbers.

Input

Textual Data

Output

Data divided into clusters.

Statistical Methods used

  • Ball tree
  • Kd tree
  • Brute

Limitations

It does not work well in the case of high-dimensional data or with clusters of varying densities.


Density-based clustering is an unsupervised learning method. It identifies distinctive clusters in data to be the regions of high point density, clearly separated from other clusters by a region of low point density. These separating regions of low point density are considered as noise or outliers.
In density-based clustering, core samples of high point density are identified, and clusters are developed from them. This method is suitable for data that contains data of comparable density. Also, clusters found in density-based clustering can be of any shape as opposed to the k-means method, where clusters are assumed to be convex-shaped.
    • Related Articles

    • Connectivity Based Clustering

      Connectivity Based Clustering Description Connectivity Based Clustering builds the clusters based on the notion that the vectors of data points in space exhibit more similarity to each other than the data points lying farther away. Why to use To form ...
    • Centroid Based Clustering

      Centroid Based Clustering Description In Centroid Based Clustering, a central vector represents each cluster. The objects are assigned to the clusters such that the squared distance between the object and the central vector is minimized. Why to use ...
    • Clustering

      Clustering is the process of grouping objects such that objects in the same group (cluster) are more similar to each other compared to those in different groups (clusters). Clustering algorithms try to group similar objects in one cluster and the ...
    • Clutering

      Data clustering (or cluster analysis) is a method of dividing the data points into several groups called clusters. All data points within a cluster are mutually similar as compared to data points belonging to different clusters. Thus, clusters are ...
    • DBSCAN

      DBSCAN Description DBSCAN stands for Density Based Spatial Clustering of Applications with Noise. It is an unsupervised ML algorithm used for segregating high-density clusters from those having low density. Why to use To create data point clusters ...