Skip to content

Curious Trino retry behaviour #22989

@dwolfeu

Description

@dwolfeu

(This is a repost of a question on SO.)

Short version

After retrying, queries hang on status FINISHING for five minutes.

Long version

The following is an extract from the values.yaml that we are using for version 0.17.0 of the Trino chart (see fault-tolerant execution):

image:
  tag: 423
additionalConfigProperties:
  - retry-policy=QUERY
  - query.remote-task.max-error-duration=1s

As the value of tag indicates, we are using release 423 of Trino.

I start some queries and then manually delete some pods. After the amount of time set by query.remote-task.max-error-duration (in this case one second, but I have tried different values), the statuses of queries change to BLOCKED, a few seconds pass, the queries resume (status is RUNNING), some more time passes and then the statuses reach FINISHING. So far so good. But this is where it gets a little strange: The statuses stay on FINISHING until five minutes (300 seconds) after the statuses changed to BLOCKED. I've tried it several times with lots of different queries and it consistently follows this behaviour, so it must be a config setting, but I don't know which one. I have tried changing the value of query.client.timeout (see docs), since this is the only one I could find with a default value of 5 minutes, but it made no difference.

The time spent on status FINISHING just seems like wasted time to me and so I would like to get to the bottom of this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions