DBSCAN |
Description | - DBSCAN stands for Density Based Spatial Clustering of Applications with Noise.
- It is an unsupervised ML algorithm used for segregating high-density clusters from those having low density.
|
Why to use | To create data point clusters based on density. |
When to use | When you want to convert data into clusters based on their density. | When not to use | For textual data |
Prerequisites | - The number of clusters need not be specified.
- The selected variables need to be scaled before clustering.
- The number of data points in a cluster should be greater than or equal to the dimension.
- There should be at least two variables/features that can be selected as independent variables.
|
Input | Any numerical dataset containing unlabeled data | Output | - Clustered Data with anomalies (noise points)
- Cluster Plot
- Silhouette Coefficient
|
Statistical Methods used | – | Limitations | - Does not work well with high dimensional data
- Choosing an epsilon value can be difficult
|