Skip to content

compactor went into tailspin #3914

@bboreham

Description

@bboreham

Describe the bug

Consul was down at the time compactor started, and it never recovered:

level=info ts=2021-03-04T20:23:01.350819279Z caller=main.go:188 msg="Starting Cortex" version="(version=1.6.0, branch=master, revision=56f794d)"
level=info ts=2021-03-04T20:23:01.35627101Z caller=module_service.go:59 msg=initialising module=server
level=info ts=2021-03-04T20:23:01.358032071Z caller=module_service.go:59 msg=initialising module=compactor
level=info ts=2021-03-04T20:23:01.351964058Z caller=server.go:229 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-03-04T20:23:01.356491008Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=info ts=2021-03-04T20:23:01.36340712Z caller=compactor.go:373 component=compactor msg="waiting until compactor is ACTIVE in the ring"
level=info ts=2021-03-04T20:23:01.366220882Z caller=lifecycler.go:527 msg="not loading tokens from file, tokens file path is empty"
level=error ts=2021-03-04T20:23:01.391200499Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.394262836Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.391467754Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.399955867Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.397642587Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.402454173Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.404896477Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.407444107Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.410901197Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.413247689Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.4157827Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:02.848432818Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:06.895736201Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:14.397161217Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:25.078769837Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:49.041899802Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:24:35.634427793Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:25:35.127795679Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:26:37.176005708Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:27:42.508689928Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:28:35.350979964Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:29:41.117751261Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:30:39.87687265Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=info ts=2021-03-04T20:31:36.036454597Z caller=client.go:247 msg="value is nil" key=compactor index=22
level=info ts=2021-03-04T20:31:36.930199939Z caller=client.go:247 msg="value is nil" key=compactor index=24
level=info ts=2021-03-04T20:31:37.932395975Z caller=client.go:247 msg="value is nil" key=compactor index=25
level=info ts=2021-03-04T20:31:41.887641219Z caller=client.go:247 msg="value is nil" key=compactor index=28
level=info ts=2021-03-04T20:31:41.941646251Z caller=client.go:247 msg="value is nil" key=compactor index=29
level=info ts=2021-03-04T20:31:46.926493102Z caller=client.go:247 msg="value is nil" key=compactor index=31
level=info ts=2021-03-04T20:31:46.967013886Z caller=client.go:247 msg="value is nil" key=compactor index=32
...
level=info ts=2021-03-05T10:12:35.156472897Z caller=client.go:247 msg="value is nil" key=compactor index=78733
level=info ts=2021-03-05T10:12:35.455228914Z caller=client.go:247 msg="value is nil" key=compactor index=78734
level=info ts=2021-03-05T10:12:36.157313862Z caller=client.go:247 msg="value is nil" key=compactor index=78736
level=info ts=2021-03-05T10:12:37.157305834Z caller=client.go:247 msg="value is nil" key=compactor index=78738

We have compactor sharding turned on:

        - -compactor.ring.consul.hostname=consul.cortex.svc.cluster.local:8500
        - -compactor.ring.prefix=
        - -compactor.ring.store=consul
        - -compactor.sharding-enabled=true

Expected behavior
I think it should exit with error in this situation; crashlooping would make the fault more obvious to the operator, and after a few restarts it would have managed to talk to Consul in my case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions