Amazon Lookout for Metrics uses machine learning (ML) to automatically detect and diagnose anomalies (i.e. outliers from the norm) in business and operational data, such as a sudden dip in sales revenue or customer acquisition rates. In a couple of clicks, you can connect Amazon Lookout for Metrics to popular data stores like Amazon S3, Amazon Redshift, and Amazon Relational Database Service (RDS), as well as third-party SaaS applications, such as Salesforce, Servicenow, Zendesk, and Marketo, and start monitoring metrics that are important to your business.
That said, it is still a computer system and at the end of the day it will respond accordingly to the specific inputs it is given. This is designed to be a short guide to help you think through the data you provide Lookout for Metrics and to better understand what the results from Lookout for Metrics mean. This will be done using a few real world-esque scenarios.
If you are going to be looking out for them, it would help to know what they are at first. Within Lookout for Metrics there are 3 components to your datasets, together they help shape your Metrics.
They are:
Timestamp
- This is required, all entries in this service are required to start with a timestamp of when the remaining columns were relevant or occurred.Dimensions
- These are categorical columns, you can have up to 5 of them, keep in mind they are combined to refer to a specific entity. For example, if your domains arelocation
andrepair_type
your data could look like this:
timestamp | location | repair_type |
---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation |
From this dataset we have 2 dimensions(location and emergency type) and when we start to think about the total number of possible metrics(full calculation to come) we can see there are 2 distinct locations and 2 distinct repair types.
Measures
- These are the numerical columns where real observable numbers are placed. These numbers are bound to a specific unique set of domains. You can also have up to 5 of these columns. Now expanding our earlier dataset with 2 additional numerical columns oftotal
andfixed
.
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
Lookout for Metrics with this dataset has 8 metrics. How did we get this number?
A Metric
is a unique combination of categorical entries and 1 numerical value.
The formula to calculate the total number of metrics is: Unique(domain1) * Unique(domain2) * Number of measures. So in this case that would be:
2 * 2 * 2 or 8.
At present Lookout for Metrics can support a maximum of 50,000 metrics per Detector, which is the trained model assigned to a particular set of data. So if you wanted to track more of them than 50,000, you would simply segment your data into multiple Detectors.
Like all great things powered by Machine Learning: It depends!
Specifically here it depends on how your data is structured and how your data is aggregated. Structure applies to the way we shape a dataset in determining the number of domains and the number of measures provided. For example the same dataset as earlier:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
This would identify the following types of issues:
- If the
total
number ofoil_change
type events at123 Interesting Ave
is anomalous - If the
total
number oftire_rotation
type events at123 Interesting Ave
is anomalous - If the
fixed
number ofoil_change
type events at123 Interesting Ave
is anomalous - If the
fixed
number oftire_rotation
type events at123 Interesting Ave
is anomalous - If the
total
number ofoil_change
type events at745 Interesting Ave
is anomalous - If the
total
number oftire_rotation
type events at745 Interesting Ave
is anomalous - If the
fixed
number ofoil_change
type events at745 Interesting Ave
is anomalous - If the
fixed
number oftire_rotation
type events at745 Interesting Ave
is anomalous
That's 8 different things, mapping to the 8 metrics identified earlier. Additionally, with the causation features, IF
there was a relationship defined by reliable patterns in the data for the values between
the total
and fixed
columns of a particular location
and repair_type
then we could report how one anomaly
may impact the other. Also, if there were a reliable pattern between locations, those anomalies
could be linked in a cause and effect relationship as well. The last thing that could happen here is that anomalies that
look similar in the same time period could be groups together and shown as on the page
at the same time.
YES!
With the simplified dataset earlier:
timestamp | location | repair_type |
---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation |
What if we changed that to:
timestamp | location | repair_type |
---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change |
01/10/2022 10:00:00 | 123 Interesting Ave | excess_mileage |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change |
01/10/2022 10:00:00 | 745 Interesting Ave | excess_mileage |
So far that seems fine... but what happens when we fill in data:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | excess_mileage | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | excess_mileage | 10 | 7 |
Now we see the problem arises when we add in the measures
. Specifically, what exactly could we mean by the total of
excess_mileage
. This could potentially be the number of vehicles seen with higher miles than their last service. That could potentially be OK
if we just took it as a regular reading, but in the context of the measure fixed
, what could go there? In this case it
does not make sense. Are we stating that we did some service to alleviate it? That might just show up as relevant other service types.
Lookout for Metrics requires all columns be filled or the datapoint will be ignored, here we can see that the structure defined
may restrict choices about what kind of data we can insert. As an exercise in cleaning it up, this might work:
timestamp | location | repair_type | total | fixed | out_of_interval |
---|---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 | 6 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 | 5 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 | 2 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 | 7 |
Here we are now seeing that there's an out_of_interval
count for the total number of vehicles that are out of their
service interval. We might expect to see patterns where the higher the value for it, indicates a lower volume of fixed
items due to the complexity of any additional maintenance. It could also have downstream impacts if it creates congestion inside the garage.
Lookout for Metrics is detecting anomalies only on structured, time series data. Anomalies are also detected against a regular interval(5min, 10min, 1hr, 1day), if your data only contains a single entry for a given metric in your interval, then aggregation has no impact whatsoever.
To use the same dataset again:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
Here we have exactly 1 entry for each metric for a given hour, if the detector is hourly then no aggregation is needed.
Lookout for Metrics supports 2 aggregation functions:
- Sum - Add all the values together within the interval.
- Average - The average of the values within the interval.
To expand now on the existing dataset:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:05:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:05:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:05:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
01/10/2022 10:05:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
Here we can see that the number of metrics has not changed, however the number of entries in our dataset has doubled, there's an entry at the start of the hour and at minute number 5. If we have selected SUM for our dataset that would yield this:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 20 | 16 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 20 | 20 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 20 | 20 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 20 | 14 |
Each numerical value has doubled(because the entries were the exact same AND there are 2 of them).
Sum is useful when you want to pay attention to the specific total value of a metric!
For Average the dataset would look like this:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
This is EXACTLY the same as the first dataset, because our secondary records were the EXACT same as well, and there were only 2 data points per aggregation window. If however we had 3 data points per interval:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:05:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:30:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:05:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:30:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:05:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:30:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
01/10/2022 10:05:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
01/10/2022 10:30:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
Interestingly the Average value looks like this:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 10 | 8 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 10 | 10 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
But the SUM table would look like this:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 30 | 24 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 30 | 30 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 30 | 30 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 30 | 21 |
What if the values were not so uniform?
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 5 | 4 |
01/10/2022 10:05:00 | 123 Interesting Ave | oil_change | 17 | 10 |
01/10/2022 10:30:00 | 123 Interesting Ave | oil_change | 2 | 1 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 7 | 7 |
01/10/2022 10:05:00 | 123 Interesting Ave | tire_rotation | 7 | 5 |
01/10/2022 10:30:00 | 123 Interesting Ave | tire_rotation | 10 | 8 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 2 | 2 |
01/10/2022 10:05:00 | 745 Interesting Ave | oil_change | 3 | 2 |
01/10/2022 10:30:00 | 745 Interesting Ave | oil_change | 10 | 9 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 4 | 4 |
01/10/2022 10:05:00 | 745 Interesting Ave | tire_rotation | 10 | 7 |
01/10/2022 10:30:00 | 745 Interesting Ave | tire_rotation | 8 | 5 |
Then the Average table looks like:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 8 | 5 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 8 | 6.66 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 5 | 4.33 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 7.33 | 5.33 |
With the SUM table:
timestamp | location | repair_type | total | fixed |
---|---|---|---|---|
01/10/2022 10:00:00 | 123 Interesting Ave | oil_change | 24 | 15 |
01/10/2022 10:00:00 | 123 Interesting Ave | tire_rotation | 24 | 18 |
01/10/2022 10:00:00 | 745 Interesting Ave | oil_change | 15 | 13 |
01/10/2022 10:00:00 | 745 Interesting Ave | tire_rotation | 22 | 16 |
Depending on your data:
Average is problematic if there are spikes within the interval that you want to be aware of!
Average is great if spikes within the interval is normal, and you want to iron them out!