Skip to content

Conversation

@pfalcon
Copy link
Contributor

@pfalcon pfalcon commented May 10, 2017

When we receive a packet with the sequence we already seen (and
processed), the most likely cause of it is that our ACK was lost,
and peer has to retransmit that packet. Then, we should just ACK
it, because otherwise peer will retransmit it again and again,
falling into exponential backoff and hosing the entire TCP
connection.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 10, 2017

This is quick and dirty patch for GH-3562 . Not suitable for merging without consideration.

@jukkar jukkar self-requested a review May 10, 2017 11:32
@pfalcon
Copy link
Contributor Author

pfalcon commented May 10, 2017

@andyross: Can you please look into this patch and the underlying issue GH-3562 . The code for:

+               /* Don't try to reorder packets.  If it doesn't match
+                * the next segment exactly, drop and wait for
+                * retransmit
+                */

was introduced by you in 3604b2c.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a force option into send_ack() call so we could avoid tweaking the sent_ack variable.

Also noticed that the send_ack() is inline, we should remove that although compiler is probably smart enough to do it itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set the date_len, can't we just send an ACK that is using currently active ack values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just parallels what's done in "normal ACK" branch: https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/net/ip/net_context.c#L827 . Apparently if I added it, it didn't work w/o it, but I don't remember exactly now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just send an ACK that is using currently active ack values?

And yep, we can't do this (if taken literally) - we need to ACK what peer asked us for, not what we currently expect from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yep, we can't do this (if taken literally) - we need to ACK what peer asked us for, not what we currently expect from it.

I thought that sending ACK to some value would automatically acknowledge all the previous values. RFC 793 in chapter 3.3. Sequence Numbers says "The acknowledgment mechanism employed is cumulative so that an acknowledgment of sequence number X indicates that all octets up to but not including X have been received." https://tools.ietf.org/html/rfc793#section-3.3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that sending ACK to some value would automatically acknowledge all the previous values.

Then the explanation of what happens is less clear. Were you able to reproduce the original issue (against archive.ubuntu.com or another site) and confirm that this patch makes a difference? We can try to just resend our current ack value (without trying to "rewind" it back), and see if that still works. I'll try that a bit later unless you beat on on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So (sorry, late to the game)... I think I understand the point here. We don't retransmit an ACK if we already sent it once, and that's wrong.

Nonetheless Jukka is exactly right: the ACK must always be for the maximum sequence number we have received (without "holes" of course). It's legal and expected that the stack will "skip" ACK values, we don't have to ACK every single packet (sort of the whole point of TCP, really).

Seems like the notion in the fix is correct: if we get a old/retransmitted packet we should always synchronously send an ACK (the correct ACK, not a repetition of whatever sequence number we just received!) on the assumption that the original got droped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross : Thanks for confirming all that. My biggest concern was that maybe after your commit I dug out, you made another commit to handle that case, which later was patched into regression, and I now just trying to reinvent the wheel (as you can see, I do the homework of reading git log, but tracing complete change history is hard, and there're indeed big code move-arounds, e.g. 29b7d9c).

So, I'm confirming that resending the current ack seq works, but otherwise leaving the structure of this patch as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, confirmed to work per above, refactored, retested, will push new version soon.

@jukkar jukkar added the net label May 12, 2017
@pfalcon pfalcon force-pushed the net-tcp-lost-peer-ack branch from f3d8f24 to c3622ad Compare May 13, 2017 19:04
@pfalcon pfalcon changed the title WIP net: tcp: Handle retransmitted packets from peer. net: tcp: Handle retransmitted packets from peer. May 13, 2017
@pfalcon
Copy link
Contributor Author

pfalcon commented May 13, 2017

The patch has been reworking per the inline comments: the latest ack number is sent per RFC793, send_ack() was update to take "force" param instead of munging with cached value. This should be ready for final review (tested also).

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need still a bit more work with this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could also remove the "inline" as that is not really needed here. Probably gcc will do it automatically but still. This could be a separate patch thou.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even per my, usually strict criteria of "don't make unrelated changes", un-inlining it would fit with this patch (like, we call it from more places - don't want to inline any longer). So, will update this patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with this current check if the send_ack has wrapped in which case the check will give incorrect answer.
I wonder if it would be good idea at this point to fix the sequence check issue you mentioned in https://jira.zephyrproject.org/browse/ZEP-2120
So we could have a generic function that would check if the seq is properly greater than some other seq, and then call that function here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check above is exactly coded to give correct answer even if seq numbers wrapped. But yes, ideally instead of that more or less complex check, there would be a semantically clearer macro/incline function call. Didn't have time to do that on weekend, so coded independently like above, will work on ZEP-2120 now.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 16, 2017

The patch updated to use net_tcp_seq_cmp(). @jukkar : Let me know if I should introduce net_tcp_seq_lesser() instead (or swap args, or whatever). Otherwise, this should be ready for merge.

jukkar
jukkar previously approved these changes May 17, 2017
When we receive a packet with the sequence we already seen (and
processed), the most likely cause of it is that our ACK was lost,
and peer has to retransmit that packet. Then, we should just ACK
it, because otherwise peer will retransmit it again and again,
falling into exponential backoff and hosing the entire TCP
connection.

This makes changes to send_ack(), adding a flag to force sending
an ACK regardless of its cached status, and remove inline modifier,
as the function is big and called from many places.

Signed-off-by: Paul Sokolovsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants