-
Notifications
You must be signed in to change notification settings - Fork 8.2k
net_pkt buffer management and API changes #11775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net_pkt buffer management and API changes #11775
Conversation
|
All checks are passing now. Review history of this comment for details about previous failed status. |
Codecov Report
@@ Coverage Diff @@
## master #11775 +/- ##
=========================================
Coverage ? 54.6%
=========================================
Files ? 235
Lines ? 26570
Branches ? 6374
=========================================
Hits ? 14509
Misses ? 9404
Partials ? 2657
Continue to review full report at Codecov.
|
|
@tbursztyka, how this relates to #11374 ? |
|
@pfalcon There are 2 relations and 1 big reason And 1 big and obvious reason: Optimizing buffer usage. This ll reserve thingy was thus mixed in L3, which architecturally speaking was bad already. But it was even worse in net_buf allocation mechanism: all struct net_buf had it's size shrunk by the ll_reserve amount. So if your frag size is 128bytes, used to built a big ethernet frame you would loose 14 bytes on all fragments but the first one. Side note: the "frag" size was also, due to this weird L3/L2 mix-up tight to the bearer which had the smallest MTU. And that was again a weird thing, since buffer management is memory management: it should not have any relationship with what bearers is used or else. Theoretically you should be able to go as low as 1 byte per-net_buf and all should work just the same. (That would not make any sense in practice obviously, having a descriptor such as a struct net_buf n times bigger than the buffer it points to) Last but not least: doing that split, offers other optimizations possibilities like in ethernet: you could have a dedicated net_buf for the ethernet header part, allocated on stack, with just 14 bytes as buffer etc... On 15.4 it will be also possible then to avoid modifying the original packet (6lo and 15.4 fragmentation were doing so before), and thus avoid - when TCP is used - to copy the whole packet in TCP for possible needed retries etc... Also, an additional reason of doing this PR is to hide all the ugly buffer details to most of the code. We have these ugly stuff everywhere: Instead we should not do any of it anywhere, you just write/read stuff from your packet. (and in case of network headers, you just access a header pointer directly). |
9120458 to
880e9bb
Compare
jukkar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of commits but fortunately most of them were small and readable.
subsys/net/ip/net_pkt.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo s/reserver/reserve/ in commit "net/pkt: Let's ignore ll reserve and use 0 instead"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the code above will check ll headers. I wonder if things change after ll reserve has been removed. I can try to run this sample to see if any other changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this one.
include/net/net_pkt.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making the function name a bit shorter so I propose we change this to net_pkt_alloc()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. I had that name at first, I don't know why I went for the plain allocate word.
include/net/net_pkt.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we call this net_pkt_tx_allocate() in order to follow the same syntax as the rx allocator one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rx stuff is an in-between name. Idea would be to have only 1 allocator for all pkt (thus one slab, etc...). However, we will still be able to know (for stats, net mem in shell etc...) if a pkt was allocated for tx or rx, there will be a bit in net_pkt to give that info and so on (net_recv_data will set it for rx, tx will be default). That's not going to be part of that PR though,
subsys/net/ip/net_pkt.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about getting max_len like this
size_t max_len = net_if_get_mtu(net_pkt_iface(pkt));
if (IS_ENABLED(CONFIG_NET_IPV6) && family == AF_INET6) {
max_len = max(max_len, NET_IPV6_MTU);
} else if (IS_ENABLED(CONFIG_NET_IPV4) && family == AF_INET) {
max_len = max(max_len, NET_IPV4_MTU);
}
then we do not need the AF_UNSPEC branch at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
subsys/net/ip/Kconfig
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offering this kind of option is imho dangerous, someone will misuse it anyway. How much memory we are saving with this, what does ram_report say if we enable/disable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for udp/ipv4 setup, even on 128bytes frag size this will be just fine option (it is dangerous at the same level as wrongly teawking frag size: user has to know what he is doing). It saves stack actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it probably save something. Just wondering do you have numbers about the savings.
subsys/net/ip/ipv4.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have next_header_proto parameter here and in finalize function, it looks like one should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean we would inform ipv4 header about the protocol only at finalize? why not indeed.
include/net/net_pkt.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we change the name to net_pkt_read_u8_new() in order to be consistent. Same with the other value reader helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be easier to remove the _new when necessary yes
|
@tbursztyka: I appreciate the detailed write up in #11775 (comment). My original question re: relation to #11374 was simpler actually, to confirm that #11374 precedes this patch, instead of e.g. be superseded by this one, or a case when they largely unrelated. So, I just assume that #11374 is the next in queue, and not anything else. |
880e9bb to
6c4b053
Compare
db872af to
c965644
Compare
|
In one of recent commits:
"is being" |
7574f71 to
a372fa9
Compare
Reworking the logic to reduce the amount of variables. Also taking the opportunity to normalize drop messages. Signed-off-by: Tomasz Bursztyka <[email protected]>
Reworking the logic to reduce the amount of variables. This part was heavier to change as it was not accessing the headers directly but instead was read parts by parts. Signed-off-by: Tomasz Bursztyka <[email protected]>
Reworking the logic to reduce the amount of variables. Introducing a generic struct to acces the common part of MLD queries, instead of accessing part by part. Also, returning NET_OK in case parsing went fine. We send an MLD report anyway, so it's not a good idea to count the message as being dropped in statistics. Signed-off-by: Tomasz Bursztyka <[email protected]>
All of these are unused anywhere now, thus can be safely removed. Signed-off-by: Tomasz Bursztyka <[email protected]>
Also, return a verdict instead of a pointer to net_pkt. It's simpler as it will be up to net_send_data()'s caller to unref the net_pkt in case of NET_DROP: less places where net_pkt can be unref. Signed-off-by: Tomasz Bursztyka <[email protected]>
Most of the code had to be reworked due to the new API: it's more logical to do everything sequentially (first headers, then MLD part) than the contrary with inserting headers at the end. Using get_data/set_data as well it makes the code clearer. Signed-off-by: Tomasz Bursztyka <[email protected]>
Now using the new API. Signed-off-by: Tomasz Bursztyka <[email protected]>
This optimizes the memory quite a bit since we do not need to clone nor split the original packet at any time. Signed-off-by: Tomasz Bursztyka <[email protected]>
Only the allocator needed to be changed here. Signed-off-by: Tomasz Bursztyka <[email protected]>
Only the allocators needed to be changed there. Signed-off-by: Tomasz Bursztyka <[email protected]>
Allocation and a minor writing logic needed to be changed. Signed-off-by: Tomasz Bursztyka <[email protected]>
This will be used by Ethernet L2 to set the right PTYPE in its header. Signed-off-by: Tomasz Bursztyka <[email protected]>
Since the rework of L2/L3 split, only L2 has access to its header. Thus up to Ethernet one to set LLDP PTYPE. Signed-off-by: Tomasz Bursztyka <[email protected]>
Minor changes: - allocator needed to be changed - and writing the lldp_pdu as well Signed-off-by: Tomasz Bursztyka <[email protected]>
These will be specifically needed in TCP, as well as being used in context internally. Signed-off-by: Tomasz Bursztyka <[email protected]>
Using new functions for net_context, ipv4 and ipv6 as well. Signed-off-by: Tomasz Bursztyka <[email protected]>
As these were parsed already by IPv4/6 input functions let's use them. Applying the change on trivial UDP usage. TCP usage will have its own commit. Signed-off-by: Tomasz Bursztyka <[email protected]>
This proovse to drastically reduce runtime overhead as it does not need to parse IP nor TCP header all over again in a lot of places. Signed-off-by: Tomasz Bursztyka <[email protected]>
Only next to be removed functions like net_tcp_set_checksum() are left untouched. All the rest is switched. Adding net_tcp_finalize() to follow the same logic as for UDP and else. Signed-off-by: Tomasz Bursztyka <[email protected]>
The test counts the '\0' of the data copied, though it does not belong to the packet. Since the test by-passes net_ipv6_input, which would update the packet length according to what is in the IPv6 header, the checksum calculation fails. Before, checksum calculation used to grab the length from IPv6 header, but that changed to use net_pkt_get_len(): that one must provide the real lenth. Signed-off-by: Tomasz Bursztyka <[email protected]>
Let's enable usage of TCP through net_context on its new sendto function. Signed-off-by: Tomasz Bursztyka <[email protected]>
Context was missing, and anyway the familiy can be grabbed from the packet. Signed-off-by: Tomasz Bursztyka <[email protected]>
This goes through basic allocation, atomic read/write, data getter/setter and cloning/copying. Signed-off-by: Tomasz Bursztyka <[email protected]>
Now that net_pkt are accessed through a common r/w API, using below a net_pkt_cursor, let's have an option that will reset this cursor once the net_pkt is freed. Result is instead of segfaulting on r/w access, these operations will bail out properly. Subsequent, and logical (unless you have a leak which is another issue) net_pkt_unref will tell you who/where the pkt was freed. Without it, you will get a segfault for instance, but that won't tell you the exact reason. This options can help you then. Signed-off-by: Tomasz Bursztyka <[email protected]>
Though these are currently used by the core only, it will be then used by net_context as well. This one of the steps to get rid of net_pkt's appdata/appdatalen attributes. Also normalizing all ip/proto parameters name to ip_hdr and proto_hdr. Signed-off-by: Tomasz Bursztyka <[email protected]>
If status is 0, both ip_hdr and proto_hdr will own a pointer to the relevant IP and Protocol headers. In order to know which of ipv4/ipv6 and udp/tcp one will need to use respectively net_pkt_family(pkt) and net_context_get_ip_proto(context). Having access to those headers directly, many callbacks will not need to parse the packet again no get the src/dst addresses or the src/dst ports. This will be change after this commit. Signed-off-by: Tomasz Bursztyka <[email protected]>
81ccc3b to
76fc11e
Compare
This is in a early stage, some stuff will change in these patches still.
In order not to break anything (besides some test due to ROM/RAM being bloated for the time being), the API is made in parallel to the existing one (thus the bloat). Currently, as the patches can tell you, only part of ipv4/Ethernet are moved towards this new API. The good thing with this approach is that old API works on packet generated with the new API and vice versa so changes can be made iteratively.