Skip to content

Conversation

@jukkar
Copy link
Member

@jukkar jukkar commented Sep 1, 2017

This is related to https://jira.zephyrproject.org/browse/ZEP-1998 issue. So when sending large TCP segment, we should only send max MTU size packets.
This PR seems to do something sane, but requires still more testing. @pfalcon did you had some test for testing sending large packets?
The first two patches fix some issues with net_pkt_clone(), the last patch adds splitting functionality to net_context.c when sending TCP data. This is probably too risky to apply to v1.9 at this point as it requires more testing.

@jukkar jukkar added the net label Sep 1, 2017
@jukkar jukkar self-assigned this Sep 1, 2017
@jukkar jukkar requested a review from pfalcon September 1, 2017 14:17
@jukkar jukkar requested a review from tbursztyka as a code owner September 1, 2017 14:17
@andyross
Copy link
Contributor

andyross commented Sep 1, 2017

Devil's advocacy: this is a lot of code that, in a practical sense, mostly just duplicates functionality that already has to be implemented in the IP fragmentation layer. Many apps may have an argument that they don't need this (they split their writes already, or are willing to pay the somewhat higher packet loss rates involved in fragmentation), yet everyone has to pay for it.

Might be worth making this a Kconfig option so someone who wants "lean TCP" can get it.

@jukkar
Copy link
Member Author

jukkar commented Sep 1, 2017

this is a lot of code that, in a practical sense, mostly just duplicates functionality that already has to be implemented in the IP fragmentation layer

Currently:

  • We have only IPv6 fragmentation implemented
  • IPv6 fragmentation is by default turned off. There reason being that it requires "lot" of memory as we need to keep track of packets that need to be resent. Same issue with memory usage is also happening with TCP.

But indeed, this PR is only applicable for TCP and we probably do not need to have both IP and TCP fragmentation enabled at the same time. So we could have this one enabled via Kconfig option.

@jukkar jukkar added this to the v1.10 milestone Sep 4, 2017
@pfalcon
Copy link
Contributor

pfalcon commented Sep 4, 2017

I'm a bit late of this, but "net: pkt: Allocate ll header when cloning" patch could well go into 1.9, as it's a clear bugfix.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 4, 2017

Many apps may have an argument that they don't need this (they split their writes already

I don't think I saw many such apps, actually not sure I saw any at all (of native API, BSD Sockets do split).

@pfalcon did you had some test for testing sending large packets?

Well, I definitely tested it when I developed sockets prototype in MicroPython. That testcase is now gone, because sockets impl doesn't allow to send more data than a packet can fit (but as your patch shows, indeed, for TCP, MSS should be used, not MTU).

Just taking HTTP server example and making it send not a few chars, but some real page (say, 10K) should show the issue.

Btw, one of the examples I wanted to add for BSD Sockets was a really-working HTTP server, serving a real page like above. But putting that under ApacheBench deadlocked it very fast. BSD Sockets currently send large data in parallel, without waiting. In other words, Zephyr IP stack deadlocks when send buffers are exhausted. I'm now pondering what to do about it: burden socket structure with a semaphore to implement sequential sending, or try to debug this deadlock issue. As you understand, the latter can be a real Pandora box.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 4, 2017

And back to the patch, it well shows the skewing of the background. Ever since @tbursztyka's refactor, net_pkt was really a network packet. Now, with this patch, it suddenly becomes either a real network packet, or not so real network packet, more of a general purpose network buffer. Confusion, extra code, ineffective resource allocation, and associated bugs will ensue.

I still think the best way to handle this issue is simply to not let a user app to put more data into a packet than it can hold. Then any user app should be prepared for a short write and send the rest in new packet(s). Simple. Easy. Effective.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 4, 2017

Simple. Easy. Effective.

Btw, @andyross' comment resoudns this idea. There's no point to have fragmentation on 2 levels. We already have packet fragmentation, here it makes sense to try something else.

@jukkar
Copy link
Member Author

jukkar commented Sep 5, 2017

Now, with this patch, it suddenly becomes either a real network packet, or not so real network packet, more of a general purpose network buffer

I do not understand this comment. The net_pkt is still network packet, the PR does not change this in any way.

Lets drop this one, I though this is what you wanted because of https://jira.zephyrproject.org/browse/ZEP-1998, we can then close that Jira item too.

@jukkar jukkar closed this Sep 5, 2017
@pfalcon pfalcon mentioned this pull request Sep 5, 2017
@pfalcon
Copy link
Contributor

pfalcon commented Sep 6, 2017

@jukkar: First of all, why close this? "net: pkt: Allocate ll header when cloning" is a bugfix, and "net: pkt: Allow cloning just the net_pkt without any data" is apparently useful, can you please resubmit them separately?

I do not understand this comment. The net_pkt is still network packet, the PR does not change this in any way.

I would define a "valid network packet" as "a packet which can be sent thru interface it's destined to". Native Zephyr IP stack allows a user to create an invalid packet (with 10K, 100K, 1M of data attached). Previously, it was kind of grey area what to do about that, kind of "API doesn't care about such case, it's app's chore to not do that, even though app doesn't have reasonable criteria what max pkt size is safe". This patch is kind of legitimizes invalid elephant packets, and throws additional resources at "fixing" them, while leaving many related questions open (or underspecified, at least per the description of this patch). E.g., what "sent" callback will get as its arguments, how many times it will be fired, etc.

Lets drop this one, I though this is what you wanted because of GH-3439, we can then close that Jira item too.

Sorry, but we can't close that ticket, as it's not resolved. Let's just not haste with resolving it with exactly this patch, and consider alternatives instead. And there're alternatives, e.g. I'm a proponent of the option that oversized packets should not be allowed to be created in the first place. And it's no longer just abstract thinking, it was implemented in BSD Sockets layer, and it "works well" in some sense of that word, and Sockets are now in the mainline. It's on my TODO to send an RFC to the mailing list noting that Sockets are now in the mainline and what implications that should have (e.g. should we push for resolving outstanding IP stack issues in "different" ways or adopt ideas used in Sockets implementation after all). I plan to send it after 1.9 release and more pressing issues are handled (like Sockets performance regression due to k_poll patch).

So for now, I'd suggest to reopen this ticket, add "[RFC]" prefix, and let it hang around, perhaps attracting more comments.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 8, 2017

@jukkar: So, I hope you don't mind if I reopen this PR and mark as RFC.

@pfalcon pfalcon reopened this Sep 8, 2017
@pfalcon pfalcon changed the title Split large TCP segments [RFC] Split large TCP segments Sep 8, 2017
@jukkar
Copy link
Member Author

jukkar commented Sep 15, 2017

I started to use this PR in #980 as I have there large files that need to be sent over HTTP. There seems to be some issue in frdm ethernet driver or somewhere in L2 as without this segment split support, either there is a bus fault or the device just restarts.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 15, 2017

without this segment split support

Perhaps "without controlling MTU size", and there was GH-3439 long ago warning of various issues due to this.

I hope this PR is still just a one way to solve it, another way is used in BSD Sockets code.

@nashif nashif modified the milestones: v1.10, v1.10.0 Oct 3, 2017
Instead of trying to send a packet that is larger than MSS,
split it into suitable pieces and queue the pieces individually.

Jira: ZEP-1998

Signed-off-by: Jukka Rissanen <[email protected]>
@jukkar
Copy link
Member Author

jukkar commented Oct 9, 2017

Let's revisit this one again. I have been using this PR in #980 and it makes the application logic much simpler and less error prone (as application does not need to send packets in specific size). Anyway, this is now rebased against latest master.

Copy link
Contributor

@tbursztyka tbursztyka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest looks good to me

int fit_len, appdata_len;

new_len += frag->len;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could remove this empty line

@pfalcon
Copy link
Contributor

pfalcon commented Oct 10, 2017

Let's revisit this once again.

Yeah, let's do it. It was on my TODO for some time to write a mailing list RFC on the situation, so now I did:
https://lists.zephyrproject.org/pipermail/zephyr-devel/2017-October/008210.html . It tries to pinpoint the situation: there's a choice of either reusing solutions which are already in the mainline (e.g. in BSD Sockets layer), or a choice of potentially grow the native API (originally the thinnest layer, most efficient) to become more and more complex (potentially more complex than sockets API).

@jukkar
Copy link
Member Author

jukkar commented Oct 11, 2017

a choice of potentially grow the native API (originally the thinnest layer, most efficient) to become more and more complex (potentially more complex than sockets API).

This comment is exaggerating the case. What I am proposing here is of course adding more code to net_context.c but it makes application logic more easy and less error prone.

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's fair to say that we reached agreement to proceed with #119 instead, which is being relied on
#4243 . So, let me -1 this.

@jukkar
Copy link
Member Author

jukkar commented Nov 10, 2017

I hope it's fair to say that we reached agreement to proceed with #119 instead

Hmm, that is reaching too far. I still would like to see this merged.

@jukkar jukkar removed this from the v1.10.0 milestone Nov 10, 2017
@pfalcon
Copy link
Contributor

pfalcon commented Nov 10, 2017

Hmm, that is reaching too far. I still would like to see this merged.

But why do we need this if we agreed to use "short write" approach instead? What's the (remaining) usecase(s) for this? And note that these approaches are conflicting, with "short write" approach, you simply will never get an oversize packet, so there will be nothing to split. (#119 may be not yet there, i.e., might not apply size check on all paths, but the idea would be to elaborate it and make that to be the case).

@nashif nashif added RFC Request For Comments: want input from the community and removed RFC Request For Comments: want input from the community labels Nov 18, 2017
@jukkar
Copy link
Member Author

jukkar commented Mar 21, 2018

Closing this as things regarding TCP will be done differently.

@jukkar jukkar closed this Mar 21, 2018
@jukkar jukkar deleted the tcp-mtu branch February 29, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Networking RFC Request For Comments: want input from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants