[RFC] Split large TCP segments #1330

jukkar · 2017-09-01T14:17:32Z

This is related to https://jira.zephyrproject.org/browse/ZEP-1998 issue. So when sending large TCP segment, we should only send max MTU size packets.
This PR seems to do something sane, but requires still more testing. @pfalcon did you had some test for testing sending large packets?
The first two patches fix some issues with net_pkt_clone(), the last patch adds splitting functionality to net_context.c when sending TCP data. This is probably too risky to apply to v1.9 at this point as it requires more testing.

andyross · 2017-09-01T16:05:59Z

Devil's advocacy: this is a lot of code that, in a practical sense, mostly just duplicates functionality that already has to be implemented in the IP fragmentation layer. Many apps may have an argument that they don't need this (they split their writes already, or are willing to pay the somewhat higher packet loss rates involved in fragmentation), yet everyone has to pay for it.

Might be worth making this a Kconfig option so someone who wants "lean TCP" can get it.

jukkar · 2017-09-01T17:21:11Z

this is a lot of code that, in a practical sense, mostly just duplicates functionality that already has to be implemented in the IP fragmentation layer

Currently:

We have only IPv6 fragmentation implemented
IPv6 fragmentation is by default turned off. There reason being that it requires "lot" of memory as we need to keep track of packets that need to be resent. Same issue with memory usage is also happening with TCP.

But indeed, this PR is only applicable for TCP and we probably do not need to have both IP and TCP fragmentation enabled at the same time. So we could have this one enabled via Kconfig option.

pfalcon · 2017-09-04T22:23:23Z

I'm a bit late of this, but "net: pkt: Allocate ll header when cloning" patch could well go into 1.9, as it's a clear bugfix.

pfalcon · 2017-09-04T22:42:43Z

Many apps may have an argument that they don't need this (they split their writes already

I don't think I saw many such apps, actually not sure I saw any at all (of native API, BSD Sockets do split).

@pfalcon did you had some test for testing sending large packets?

Well, I definitely tested it when I developed sockets prototype in MicroPython. That testcase is now gone, because sockets impl doesn't allow to send more data than a packet can fit (but as your patch shows, indeed, for TCP, MSS should be used, not MTU).

Just taking HTTP server example and making it send not a few chars, but some real page (say, 10K) should show the issue.

Btw, one of the examples I wanted to add for BSD Sockets was a really-working HTTP server, serving a real page like above. But putting that under ApacheBench deadlocked it very fast. BSD Sockets currently send large data in parallel, without waiting. In other words, Zephyr IP stack deadlocks when send buffers are exhausted. I'm now pondering what to do about it: burden socket structure with a semaphore to implement sequential sending, or try to debug this deadlock issue. As you understand, the latter can be a real Pandora box.

pfalcon · 2017-09-04T22:50:30Z

And back to the patch, it well shows the skewing of the background. Ever since @tbursztyka's refactor, net_pkt was really a network packet. Now, with this patch, it suddenly becomes either a real network packet, or not so real network packet, more of a general purpose network buffer. Confusion, extra code, ineffective resource allocation, and associated bugs will ensue.

I still think the best way to handle this issue is simply to not let a user app to put more data into a packet than it can hold. Then any user app should be prepared for a short write and send the rest in new packet(s). Simple. Easy. Effective.

pfalcon · 2017-09-04T22:52:41Z

Simple. Easy. Effective.

Btw, @andyross' comment resoudns this idea. There's no point to have fragmentation on 2 levels. We already have packet fragmentation, here it makes sense to try something else.

jukkar · 2017-09-05T08:38:11Z

Now, with this patch, it suddenly becomes either a real network packet, or not so real network packet, more of a general purpose network buffer

I do not understand this comment. The net_pkt is still network packet, the PR does not change this in any way.

Lets drop this one, I though this is what you wanted because of https://jira.zephyrproject.org/browse/ZEP-1998, we can then close that Jira item too.

pfalcon · 2017-09-06T06:44:57Z

@jukkar: First of all, why close this? "net: pkt: Allocate ll header when cloning" is a bugfix, and "net: pkt: Allow cloning just the net_pkt without any data" is apparently useful, can you please resubmit them separately?

I do not understand this comment. The net_pkt is still network packet, the PR does not change this in any way.

I would define a "valid network packet" as "a packet which can be sent thru interface it's destined to". Native Zephyr IP stack allows a user to create an invalid packet (with 10K, 100K, 1M of data attached). Previously, it was kind of grey area what to do about that, kind of "API doesn't care about such case, it's app's chore to not do that, even though app doesn't have reasonable criteria what max pkt size is safe". This patch is kind of legitimizes invalid elephant packets, and throws additional resources at "fixing" them, while leaving many related questions open (or underspecified, at least per the description of this patch). E.g., what "sent" callback will get as its arguments, how many times it will be fired, etc.

Lets drop this one, I though this is what you wanted because of GH-3439, we can then close that Jira item too.

Sorry, but we can't close that ticket, as it's not resolved. Let's just not haste with resolving it with exactly this patch, and consider alternatives instead. And there're alternatives, e.g. I'm a proponent of the option that oversized packets should not be allowed to be created in the first place. And it's no longer just abstract thinking, it was implemented in BSD Sockets layer, and it "works well" in some sense of that word, and Sockets are now in the mainline. It's on my TODO to send an RFC to the mailing list noting that Sockets are now in the mainline and what implications that should have (e.g. should we push for resolving outstanding IP stack issues in "different" ways or adopt ideas used in Sockets implementation after all). I plan to send it after 1.9 release and more pressing issues are handled (like Sockets performance regression due to k_poll patch).

So for now, I'd suggest to reopen this ticket, add "[RFC]" prefix, and let it hang around, perhaps attracting more comments.

pfalcon · 2017-09-08T17:58:42Z

@jukkar: So, I hope you don't mind if I reopen this PR and mark as RFC.

jukkar · 2017-09-15T12:18:14Z

I started to use this PR in #980 as I have there large files that need to be sent over HTTP. There seems to be some issue in frdm ethernet driver or somewhere in L2 as without this segment split support, either there is a bus fault or the device just restarts.

pfalcon · 2017-09-15T15:23:22Z

without this segment split support

Perhaps "without controlling MTU size", and there was GH-3439 long ago warning of various issues due to this.

I hope this PR is still just a one way to solve it, another way is used in BSD Sockets code.

Instead of trying to send a packet that is larger than MSS, split it into suitable pieces and queue the pieces individually. Jira: ZEP-1998 Signed-off-by: Jukka Rissanen <[email protected]>

jukkar · 2017-10-09T08:15:01Z

Let's revisit this one again. I have been using this PR in #980 and it makes the application logic much simpler and less error prone (as application does not need to send packets in specific size). Anyway, this is now rebased against latest master.

tbursztyka

rest looks good to me

tbursztyka · 2017-10-09T09:27:35Z

subsys/net/ip/net_context.c

+		int fit_len, appdata_len;
+
+		new_len += frag->len;
+


you could remove this empty line

pfalcon · 2017-10-10T18:58:26Z

Let's revisit this once again.

Yeah, let's do it. It was on my TODO for some time to write a mailing list RFC on the situation, so now I did:
https://lists.zephyrproject.org/pipermail/zephyr-devel/2017-October/008210.html . It tries to pinpoint the situation: there's a choice of either reusing solutions which are already in the mainline (e.g. in BSD Sockets layer), or a choice of potentially grow the native API (originally the thinnest layer, most efficient) to become more and more complex (potentially more complex than sockets API).

jukkar · 2017-10-11T07:48:20Z

a choice of potentially grow the native API (originally the thinnest layer, most efficient) to become more and more complex (potentially more complex than sockets API).

This comment is exaggerating the case. What I am proposing here is of course adding more code to net_context.c but it makes application logic more easy and less error prone.

pfalcon

I hope it's fair to say that we reached agreement to proceed with #119 instead, which is being relied on
#4243 . So, let me -1 this.

jukkar · 2017-11-10T15:30:41Z

I hope it's fair to say that we reached agreement to proceed with #119 instead

Hmm, that is reaching too far. I still would like to see this merged.

pfalcon · 2017-11-10T15:38:07Z

Hmm, that is reaching too far. I still would like to see this merged.

But why do we need this if we agreed to use "short write" approach instead? What's the (remaining) usecase(s) for this? And note that these approaches are conflicting, with "short write" approach, you simply will never get an oversize packet, so there will be nothing to split. (#119 may be not yet there, i.e., might not apply size check on all paths, but the idea would be to elaborate it and make that to be the case).

jukkar · 2018-03-21T13:56:43Z

Closing this as things regarding TCP will be done differently.

jukkar added the net label Sep 1, 2017

jukkar self-assigned this Sep 1, 2017

jukkar requested a review from pfalcon September 1, 2017 14:17

jukkar requested a review from tbursztyka as a code owner September 1, 2017 14:17

jukkar force-pushed the tcp-mtu branch from 471c14d to 4b44639 Compare September 1, 2017 14:36

jukkar added this to the v1.10 milestone Sep 4, 2017

jukkar closed this Sep 5, 2017

pfalcon mentioned this pull request Sep 5, 2017

Websocket support #1271

Closed

pfalcon reopened this Sep 8, 2017

pfalcon changed the title ~~Split large TCP segments~~ [RFC] Split large TCP segments Sep 8, 2017

nashif added area: Networking and removed net labels Sep 23, 2017

nashif modified the milestones: v1.10, v1.10.0 Oct 3, 2017

net: tcp: Split sent packet into MSS size pieces

41565cf

Instead of trying to send a packet that is larger than MSS, split it into suitable pieces and queue the pieces individually. Jira: ZEP-1998 Signed-off-by: Jukka Rissanen <[email protected]>

jukkar force-pushed the tcp-mtu branch from 4b44639 to 41565cf Compare October 9, 2017 08:13

tbursztyka reviewed Oct 9, 2017

View reviewed changes

subsys/net/ip/net_context.c

int fit_len, appdata_len;

new_len += frag->len;

Copy link

Contributor

tbursztyka Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could remove this empty line

This was referenced Oct 25, 2017

net: net_pkt_append: Take into account MSS/MTU when adding data to a packet #119

Merged

[WIP] RPL border router sample application #980

Closed

jukkar mentioned this pull request Oct 27, 2017

HTTP library that uses net-app #4243

Merged

pfalcon suggested changes Nov 9, 2017

View reviewed changes

jukkar removed this from the v1.10.0 milestone Nov 10, 2017

nashif added RFC Request For Comments: want input from the community and removed RFC Request For Comments: want input from the community labels Nov 18, 2017

pfalcon mentioned this pull request Dec 26, 2017

net: ARP/ND: Possibility for deadlocks and DoS #5484

Closed

jukkar closed this Mar 21, 2018

jukkar deleted the tcp-mtu branch February 29, 2024 08:25

[RFC] Split large TCP segments #1330

[RFC] Split large TCP segments #1330

Uh oh!

Conversation

jukkar commented Sep 1, 2017

Uh oh!

andyross commented Sep 1, 2017

Uh oh!

jukkar commented Sep 1, 2017

Uh oh!

pfalcon commented Sep 4, 2017

Uh oh!

pfalcon commented Sep 4, 2017

Uh oh!

pfalcon commented Sep 4, 2017

Uh oh!

pfalcon commented Sep 4, 2017

Uh oh!

jukkar commented Sep 5, 2017

Uh oh!

pfalcon commented Sep 6, 2017 • edited by zephyrbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfalcon commented Sep 8, 2017

Uh oh!

jukkar commented Sep 15, 2017

Uh oh!

pfalcon commented Sep 15, 2017 • edited by zephyrbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jukkar commented Oct 9, 2017

Uh oh!

tbursztyka left a comment

Choose a reason for hiding this comment

Uh oh!

tbursztyka Oct 9, 2017

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Oct 10, 2017

Uh oh!

jukkar commented Oct 11, 2017

Uh oh!

pfalcon left a comment

Choose a reason for hiding this comment

Uh oh!

jukkar commented Nov 10, 2017

Uh oh!

pfalcon commented Nov 10, 2017

Uh oh!

jukkar commented Mar 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pfalcon commented Sep 6, 2017 •

edited by zephyrbot

Loading

pfalcon commented Sep 15, 2017 •

edited by zephyrbot

Loading