From 55f4197fb9361d013e948f8b4669843d233ced07 Mon Sep 17 00:00:00 2001 From: Michel Hollands Date: Fri, 7 Aug 2020 11:15:36 +0100 Subject: [PATCH 1/7] Add playbook entry for sample with repeated timestamp --- cortex-mixin/docs/playbooks.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index fb408b9d..46c5a6b0 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -1,6 +1,6 @@ # Playbooks -This document contains playbooks, or at least a checklist of what to look for, for alerts in the cortex-mixin. This document assumes that you are running a Cortex cluster: +This document contains playbooks, or at least a checklist of what to look for, for alerts in the cortex-mixin and logs from Cortex. This document assumes that you are running a Cortex cluster: 1. Using this mixin config 2. Using GCS as object store (but similar procedures apply to other backends) @@ -362,3 +362,16 @@ A PVC can be manually deleted by an operator. When a PVC claim is deleted, what - `Retain`: the volume will not be deleted until the PV resource will be manually deleted from Kubernetes - `Delete`: the volume will be automatically deleted + + +## Log lines + +### Log line containing 'sample with repeated timestamp but different value' + +This means a sample with the same timestamp as an existing one was received with a different value. The number of occurrences is recorded in the `prometheus_target_scrapes_sample_out_of_order_total` metric. + +Possible reasons for this are: +- Multiple agents are scraping the same app without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the deplicate sample. Change the labels of each sample generated per agent so they are unique. +- Incorrect relabelling rules can cause a label to be dropped from a sample so that multiple samples have the same labels. If these samples were collected at the same time they will cause this error. +- The exporter being scraped sets the same timestamp on every scrape. Note that exporters should generally not set timestamps. +- Prometheus scrapes at the millisecond level. If the scrapes are done very quickly the same sample could be returned. This is very unlikely. From b3b7ce9e58755157422ae5f5663c9fd09f75bc04 Mon Sep 17 00:00:00 2001 From: MichelHollands <42814411+MichelHollands@users.noreply.github.com> Date: Fri, 7 Aug 2020 11:59:59 +0100 Subject: [PATCH 2/7] Update cortex-mixin/docs/playbooks.md Co-authored-by: Jack Baldry --- cortex-mixin/docs/playbooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index 46c5a6b0..087a3b5e 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -371,7 +371,7 @@ A PVC can be manually deleted by an operator. When a PVC claim is deleted, what This means a sample with the same timestamp as an existing one was received with a different value. The number of occurrences is recorded in the `prometheus_target_scrapes_sample_out_of_order_total` metric. Possible reasons for this are: -- Multiple agents are scraping the same app without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the deplicate sample. Change the labels of each sample generated per agent so they are unique. +- Multiple agents are scraping the same app without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. - Incorrect relabelling rules can cause a label to be dropped from a sample so that multiple samples have the same labels. If these samples were collected at the same time they will cause this error. - The exporter being scraped sets the same timestamp on every scrape. Note that exporters should generally not set timestamps. - Prometheus scrapes at the millisecond level. If the scrapes are done very quickly the same sample could be returned. This is very unlikely. From f6c0e7dc7cf3bf734dfedd460445ec5e2922ca81 Mon Sep 17 00:00:00 2001 From: MichelHollands <42814411+MichelHollands@users.noreply.github.com> Date: Mon, 10 Aug 2020 09:02:03 +0100 Subject: [PATCH 3/7] Update cortex-mixin/docs/playbooks.md Co-authored-by: Marco Pracucci --- cortex-mixin/docs/playbooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index 087a3b5e..dfa0f00a 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -368,7 +368,7 @@ A PVC can be manually deleted by an operator. When a PVC claim is deleted, what ### Log line containing 'sample with repeated timestamp but different value' -This means a sample with the same timestamp as an existing one was received with a different value. The number of occurrences is recorded in the `prometheus_target_scrapes_sample_out_of_order_total` metric. +This means a sample with the same timestamp as the latest one was received with a different value. The number of occurrences is recorded in the `cortex_discarded_samples_total` metric with the label `reason="new-value-for-timestamp"`. Possible reasons for this are: - Multiple agents are scraping the same app without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. From a8f5e8ae36d8cb91fab4889316604aba71008803 Mon Sep 17 00:00:00 2001 From: MichelHollands <42814411+MichelHollands@users.noreply.github.com> Date: Mon, 10 Aug 2020 09:18:33 +0100 Subject: [PATCH 4/7] Update cortex-mixin/docs/playbooks.md Co-authored-by: Marco Pracucci --- cortex-mixin/docs/playbooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index dfa0f00a..773aa16e 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -371,7 +371,7 @@ A PVC can be manually deleted by an operator. When a PVC claim is deleted, what This means a sample with the same timestamp as the latest one was received with a different value. The number of occurrences is recorded in the `cortex_discarded_samples_total` metric with the label `reason="new-value-for-timestamp"`. Possible reasons for this are: -- Multiple agents are scraping the same app without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. +- Multiple Prometheus servers / Grafana agents are scraping the same target without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. - Incorrect relabelling rules can cause a label to be dropped from a sample so that multiple samples have the same labels. If these samples were collected at the same time they will cause this error. - The exporter being scraped sets the same timestamp on every scrape. Note that exporters should generally not set timestamps. - Prometheus scrapes at the millisecond level. If the scrapes are done very quickly the same sample could be returned. This is very unlikely. From f6c69e0384aafc31f7eaae7e964a3559d14c49f5 Mon Sep 17 00:00:00 2001 From: Michel Hollands Date: Mon, 10 Aug 2020 09:26:06 +0100 Subject: [PATCH 5/7] Address review comments --- cortex-mixin/docs/playbooks.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index 773aa16e..570c9e8d 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -372,6 +372,5 @@ This means a sample with the same timestamp as the latest one was received with Possible reasons for this are: - Multiple Prometheus servers / Grafana agents are scraping the same target without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. -- Incorrect relabelling rules can cause a label to be dropped from a sample so that multiple samples have the same labels. If these samples were collected at the same time they will cause this error. +- Incorrect relabelling rules can cause a label to be dropped from a sample so that multiple samples have the same labels. If these samples were collected from the same target they will have the same timestamp. An example is dropping the `cpu` label when there are multiple cpus. - The exporter being scraped sets the same timestamp on every scrape. Note that exporters should generally not set timestamps. -- Prometheus scrapes at the millisecond level. If the scrapes are done very quickly the same sample could be returned. This is very unlikely. From 65b9aca21ddf9712b2f6ae0f08461a714d08ab1f Mon Sep 17 00:00:00 2001 From: MichelHollands <42814411+MichelHollands@users.noreply.github.com> Date: Mon, 10 Aug 2020 15:11:00 +0100 Subject: [PATCH 6/7] Update cortex-mixin/docs/playbooks.md Co-authored-by: Marco Pracucci --- cortex-mixin/docs/playbooks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index 570c9e8d..fa0f3979 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -372,5 +372,5 @@ This means a sample with the same timestamp as the latest one was received with Possible reasons for this are: - Multiple Prometheus servers / Grafana agents are scraping the same target without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. -- Incorrect relabelling rules can cause a label to be dropped from a sample so that multiple samples have the same labels. If these samples were collected from the same target they will have the same timestamp. An example is dropping the `cpu` label when there are multiple cpus. +- Incorrect relabelling rules can cause a label to be dropped from a series so that multiple series have the same labels. If these series were collected from the same target they will have the same timestamp. - The exporter being scraped sets the same timestamp on every scrape. Note that exporters should generally not set timestamps. From 6c9e1a6cd6140126f41be56e7a6dfd354b29c4aa Mon Sep 17 00:00:00 2001 From: Michel Hollands Date: Mon, 10 Aug 2020 15:39:04 +0100 Subject: [PATCH 7/7] Remove unlikely reason for duplicate sample --- cortex-mixin/docs/playbooks.md | 1 - 1 file changed, 1 deletion(-) diff --git a/cortex-mixin/docs/playbooks.md b/cortex-mixin/docs/playbooks.md index fa0f3979..e5358605 100644 --- a/cortex-mixin/docs/playbooks.md +++ b/cortex-mixin/docs/playbooks.md @@ -371,6 +371,5 @@ A PVC can be manually deleted by an operator. When a PVC claim is deleted, what This means a sample with the same timestamp as the latest one was received with a different value. The number of occurrences is recorded in the `cortex_discarded_samples_total` metric with the label `reason="new-value-for-timestamp"`. Possible reasons for this are: -- Multiple Prometheus servers / Grafana agents are scraping the same target without deduplication in place. Check the IP addresses mentioned in the log line for the agent that returned the duplicate sample. Change the labels of each sample generated per agent so they are unique. - Incorrect relabelling rules can cause a label to be dropped from a series so that multiple series have the same labels. If these series were collected from the same target they will have the same timestamp. - The exporter being scraped sets the same timestamp on every scrape. Note that exporters should generally not set timestamps.