What are Anomalies?

Get an introduction of anomalies in a dataset, and understand the usage of mean and standard deviation in identifying them.

We'll cover the following

Introduction#

An anomaly in a data series is a significant deviation from some reasonable value. Looking at this series of numbers. For example, which number stands out?

%0 node_1 2 node_2 3 node_3 5 node_1619513589624 2 node_1619513595993 3 node_1619513615009 12 node_1619513661159 5 node_1619513637264 3 node_1619513619402 4

The number that stands out in this series is 12.

Scatter plot for the series
Scatter plot for the series

This is intuitive to a human, but computer programs do not have that intuition…

Mathematical foundation#

To find the anomaly in the series, we first need to define a reasonable value and then define how far away we consider a significant deviation from this value:

The mean is ~4.33.
Next, we need to define the deviation. Let’s use Standard Deviation:

Standard deviation is the square root of the variance, which is the average squared distance from the mean. In this case, it is 3.08.

Now that we have defined a “reasonable” value and a deviation, we can define a range of acceptable values:

The range we defined is one standard deviation from the mean. Any value outside this range is considered an anomaly:

Using the query, we found that the value 12 is outside the range of acceptable values and identified it as an anomaly.

Understanding Z-score
Mark as Completed
Report an Issue