Skip to content

PaymentRetrier continues retrying after an Event::PaymentPathFailed with all_paths_failed set is exposed to user #1164

@TheBlueMatt

Description

@TheBlueMatt

I saw multiple Event::PaymentPathFailed with all_paths_failed set exposed via the CLI on the sample node yesterday all for the same payment. I believe what happens is:

  • send an MPP payment
  • have a few failures
  • have two failure in a row (eg from the same commitment transaction update), the second of which has all_paths_failed set and finishes off our retry count
  • when the first is processed, we happily retry the payment, and now have a pending path
  • then we process the second Event::PaymentPathFailed, seeing that we're out of retries, we give it to the user.
  • when the other path fails we now re-insert the pending payment, giving it a fresh retry counter and go around again (but only with part of the full payment amount).

That last part is important - a well-behaved recipient should never fulfill the payment at this point, they only ever have a subset of the amount, so doing so would be "giving us free money". But, if the recipient does fulfill it, the user would see a Event::PaymentPathFailed with all_paths_failed set followed by a Event::PaymentSent, possible causing payment tracking code on the user end to get confused or consider a sent payment failed!

This is particularly nontrivial to address - adding a retry counter in PaymentRetrier doesn't seem to work as on startup we may hit a similar ordering of events and still be left confused. Its unclear to me how even listing the pending payments (#1157) can solve this - we don't currently have a on-startup hook for PaymentRetrier (and I'd ideally prefer not to add one), and asking the ChannelManager about pending payment parts for a just-failed payment isn't helpful cause you can have further events for the same payment in the event queue as well.

Some things that do work (I think):

  • Adding an on-startup hook to PaymentRetrier plus tracking pending payment paths there,
  • Tracking retry count in ChannelManager (even if we don't persist it) - this makes deciding whether to retry a decision that's basically on the ChannelManager, even if it doesn't know the target retry count. I think this is "the simple answer".

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions