Outlier Detection

Outlier Detection

Outlier Detection

Description

Outlier Detection reveals the extreme values that deviate from the rest of the data in a real-world dataset.

Why to use

Numerical Analysis – Data Preparation 

When to use

When there are certain values in the data which significantly deviate from the rest of the data. 

When not to use

On textual data.
When there are no outliers in the data.

Prerequisites

It should be used on numerical data. 

Input

Dataset with extreme values. 

Output

Dataset with extreme values either removed or imputed with mean, median, or mode.

Statistical Methods used

  • Outside of 1.5 IQR Rule
  • Outside of 5th and 95th Percentile Range
  • Outside of 2nd and 98th Percentile Range
  • 3 Standard Deviations from the Mean
  • Mean
  • Median
  • Mode 

Limitations

-

An outlier is a data value that is unlike the rest of the data. It is rare, or distinct, and does not fit in with the rest of the data.
There are many ways data can end up with outliers. For example,

  • In case of consumer data for an e-commerce site, there might be very few customers buying products in huge quantity.
  • In case of average mortality rate, there could be very few people who live beyond 100 years of age.

Most algorithms (including scikit-learn) will give you incorrect results if there are outliers present in the data. That is because, these estimators assume that all values fall in a particular range. So, it is recommended to use the Outlier Detection method to identify these rare and extreme values. This detection and correction of outliers helps to generate a uniform dataset.
There are multiple outlier detection methods available. Few of them are listed below.

  • Standard Deviation Method
  • Interquartile Range Method
  • Automatic Outlier Detection

    • Recent Topics

    • I have a doubt

      What are the differences between supervised and unsupervised learning?
    • Building structured data storage strategy for Things data

      IOT data packets are generally dependent on the sensors, devices ,gateways, systems that generate them. Also this data can be structured or unstructured. For the sake of this post lets talk about structured data and challenges to build a mechanism of
    • About the RubiStudio category

      This category consists of all topic discussions related to data preparation & modeling, machine learning, forecasting, textual analytics, and Pro Code possible with RubiStudio.
    • Quick Use of Data Dictionary and Workflow

      Data Dictionary: With the use of pre-existing datasets, a new data dictionary can be produced. It enables you to create new datasets while reusing the existing ones and incorporating new features. By leveraging the preexisting datasets, the Data Dictionary
    • Flow of Information using Sankey Chart in RubiSight

      A Sankey chart, also known as a Sankey diagram or flow diagram, is a type of data visualization that represents the flow of resources, energy, or other quantities between multiple entities or categories. It is often used to visualize the distribution
    • Securing Your Data at Every Level: Introducing Rubisight’s Data Level Security

      In the digital age, data governance and security have emerged as paramount concerns for organizations across industries. With the proliferation of data breaches and regulatory requirements, safeguarding sensitive information and ensuring compliance has
    • Infographics and Specialized Widgets: When to Use, Best Practices, and Limitations

      RubiSight widgets, also known as charts, are the building blocks of visual data storytelling dashboards within the Rubiscape platform. They are essentially visual representations of your data that help you explore, understand, and communicate insights
    • How to include drill-down to details option?

      Question - For example, I have 10 records. 7 of which are ok, 3 are not ok. On dashboard I want to show count and aggregated values for 7 and 3 respectively. And details of 3 or 7 records should be available on user demand. There are two ways to drill
    • Use of HTML for Bookmarking in Rubisight

      What is bookmarking? How do we achieve it using Power BI? Bookmarking is a common feature in various data visualization tools, and it generally refers to the ability to save and revisit a specific state or view of a report/dashboard. This feature allows
    • Rubisight Filters Explained: Finding Insights Faster

      Rubisight filters help you focus on specific data within your dashboard, making it easier to spot trends and gain valuable insights. Here are the three main types: 1. Global Filters: Apply the same filter criteria to all pages on a dashboard. Useful for
    • Banking Analytics: Reimagining the Way Banks Do Business

      Created a Comprehensive dashboard on Banking Analytics. With banking products becoming increasingly commoditized, Analytics can help banks differentiate themselves and gain a competitive edge. This dashboard showcase trends that help management in decision
    • Rubisight Overview - Key Functionalities

      Find the dashboard used in Rubisight Overview Session. This dashboard demonstrated most of the functionalities in Rubisight platform. The funcationalities used in the dashboard are as follows - Data Dictionary used for Rubisight dashboard creation. Calculated
    • LeBron James vs. Michael Jordan: The Greatest Debate in Basketball History

      My new Viz on Rubisight Shapes - This entire dashboard is prepared based on shapes and filters. The debate over who is the greatest basketball player of all time, LeBron James or Michael Jordan, has captivated fans and analysts for years. Both athletes
    • Outlier Detection

      Outlier Detection Description Outlier Detection reveals the extreme values that deviate from the rest of the data in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are certain values in the data which significantly
    • Hardik vs Rohit - An Inhouse Rivalry!

      The Big Debate - Hardik vs Rohit ---- Mumbai Indians are already out of the IPL and whose fault !!! Is it Hardik ??? The news sources are debating on the two former Indian Player, is it a correct debate ? image723×712 181 KB Find the stats in Rubisight
    • How to convert the data type for a specific data point in Rubiflow?

      In Rubiflow once the data node is processed, you can use an expression from the Data preparaion function (Data integration) and connect to the data source node. In that you will be able to see a functions on the left side - “Convert”, through this functionality
    • Welcome to Community

      The Community hub inside Zoho Desk lets you build a powerful community around your business, while still serving customers via other channels. Your customers can post 4 types of content within your community. These are Ideas, Questions, Problems and Discussions.