DBSCAN

DBSCAN

DBSCAN

Description

  • DBSCAN stands for Density Based Spatial Clustering of Applications with Noise.
  • It is an unsupervised ML algorithm used for segregating high-density clusters from those having low density.

Why to use

To create data point clusters based on density.

When to use

When you want to convert data into clusters based on their density.

When not to use

For textual data

Prerequisites

  • The number of clusters need not be specified.
  • The selected variables need to be scaled before clustering.
  • The number of data points in a cluster should be greater than or equal to the dimension.
  • There should be at least two variables/features that can be selected as independent variables.

Input

Any numerical dataset containing unlabeled data

Output

  • Clustered Data with anomalies (noise points)
  • Cluster Plot
  • Silhouette Coefficient

Statistical Methods used

Limitations

  • Does not work well with high dimensional data
  • Choosing an epsilon value can be difficult
    • Related Articles

    • Density Based Clustering

      Density Based Clustering Description It classifies the given set of data by building clusters based on the idea that a cluster in the data space is a continuous region of high point density, separated from other clusters by continuous regions of low ...
    • Rubiscape Spring '22

      New Features Platform & Studio Rubiscape Persistent variables in workflow and workbook - The user can declare a variable to be remembered between function calls Separate Service for Visualization - Provide separate service for Visualization which ...
    • Machine Learning Concepts

      Advanced Entity Extraction Advanced entity extraction, also known as entity recognition, is used to extract vital information for natural language processing (NLP). It is widely used for finding, storing and sorting textual content into default ...