Stratified Sampling

Stratified Sampling

Stratified Sampling

Description

It divides the population in homogeneous subpopulations or strata, depending on characteristics. Each member should be assigned exactly only one stratum.

Why to use

To obtain a sample that best represents the entire population

When to use

  • Imbalanced data
  • Reducing the overall variance
  • Ensuring diversity in the sample

When not to use

  • Unable to classify every member of the population in a unique strata

Prerequisites

  • Unique strata
  • Every member of the population should be assigned a strata

Input

  • Any dataset containing categorical variables.

Output

  • A sample best representing the population.

Statistical Methods Used

Limitations

  • Members are classified into multiple categories.

    • Related Articles

    • Sampling

      Sampling is a technique used in Statistical Analysis in which a fixed number of data points are selected from a large dataset. This selection of smaller subsets helps to perform analysis faster and at a low computational cost. There are different ...
    • Sampling

      Sampling is a technique used in Statistical Analysis in which a fixed number of data points are selected from a large dataset. This selection of smaller subsets helps to perform analysis faster and at a low computational cost. There are different ...
    • Stratified Sampling

      Stratified Sampling Description It divides the population in homogeneous subpopulations or strata, depending on characteristics. Each member should be assigned exactly only one stratum. Why to use To obtain a sample that best represents the entire ...
    • Statistical Concepts

      Accuracy Accuracy (of classification) of a predictive model is the ratio of the total number of correct predictions made by the model to the total predictions made. Thus, Accuracy = (TP + TN) / (TP + TN + FP + FN) Where, TP, TN, FP, and FN indicate ...
    • Random Forest Regression

      Random Forest Regression Description Random Forest Regression is an ensemble learning method that combines multiple decision trees to create a powerful predictive model for continuous target variables. It utilizes random feature selection to improve ...