Stratified Sampling

Stratified Sampling

Stratified Sampling

Description

It divides the population in homogeneous subpopulations or strata, depending on characteristics. Each member should be assigned exactly only one stratum.

Why to use

To obtain a sample that best represents the entire population

When to use

  • Imbalanced data
  • Reducing the overall variance
  • Ensuring diversity in the sample

When not to use

  • Unable to classify every member of the population in a unique strata

Prerequisites

  • Unique strata
  • Every member of the population should be assigned a strata

Input

  • Any dataset containing categorical variables.

Output

  • A sample best representing the population.

Statistical Methods Used

Limitations

  • Members are classified into multiple categories.

    • Related Articles

    • Sampling

      Sampling is a technique used in Statistical Analysis in which a fixed number of data points are selected from a large dataset. This selection of smaller subsets helps to perform analysis faster and at a low computational cost. There are different ...
    • Sampling

      Sampling is a technique used in Statistical Analysis in which a fixed number of data points are selected from a large dataset. This selection of smaller subsets helps to perform analysis faster and at a low computational cost. There are different ...
    • Stratified Sampling

      Stratified Sampling Description It divides the population in homogeneous subpopulations or strata, depending on characteristics. Each member should be assigned exactly only one stratum. Why to use To obtain a sample that best represents the entire ...
    • Statistical Concepts

      Accuracy Accuracy (of classification) of a predictive model is the ratio of the total number of correct predictions made by the model to the total predictions made. Thus, Accuracy = (TP + TN) / (TP + TN + FP + FN) Where, TP, TN, FP, and FN indicate ...
    • Decision Tree Regression

      Decision Tree Regression Description Decision Tree Regressor builds a regression model in the form of a tree structure where each leaf node represented a class or a decision. Why to use When you want to predict a value depending on single or multiple ...