net_pkt buffer management and API changes #11775

tbursztyka · 2018-11-30T15:42:41Z

This is in a early stage, some stuff will change in these patches still.

In order not to break anything (besides some test due to ROM/RAM being bloated for the time being), the API is made in parallel to the existing one (thus the bloat). Currently, as the patches can tell you, only part of ipv4/Ethernet are moved towards this new API. The good thing with this approach is that old API works on packet generated with the new API and vice versa so changes can be made iteratively.

zephyrbot · 2018-11-30T16:07:30Z

All checks are passing now.

Review history of this comment for details about previous failed status.
Note that some checks might have not completed yet.

codecov-io · 2018-11-30T16:36:26Z

Codecov Report

❗ No coverage uploaded for pull request base (master@941f435). Click here to learn what that means.
The diff coverage is 51.62%.

@@            Coverage Diff            @@
##             master   #11775   +/-   ##
=========================================
  Coverage          ?    54.6%           
=========================================
  Files             ?      235           
  Lines             ?    26570           
  Branches          ?     6374           
=========================================
  Hits              ?    14509           
  Misses            ?     9404           
  Partials          ?     2657

Impacted Files	Coverage Δ
subsys/net/ip/net_private.h	`67.74% <ø> (ø)`
subsys/net/lib/dns/resolve.c	`33.57% <ø> (ø)`
include/net/net_ip.h	`68.13% <ø> (ø)`
include/net/net_context.h	`78.84% <ø> (ø)`
subsys/net/ip/dhcpv4.c	`6.14% <0%> (ø)`
subsys/net/ip/tcp_internal.h	`45.71% <0%> (ø)`
drivers/ethernet/eth_native_posix.c	`30.3% <0%> (ø)`
subsys/net/ip/net_core.c	`63.87% <100%> (ø)`
include/net/net_pkt.h	`85.08% <100%> (ø)`
subsys/net/ip/net_if.c	`60.4% <100%> (ø)`
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 941f435...76fc11e. Read the comment docs.

pfalcon · 2018-12-03T07:07:59Z

@tbursztyka, how this relates to #11374 ?

tbursztyka · 2018-12-03T08:46:06Z

@pfalcon There are 2 relations and 1 big reason
1 indirect relation: In current buffer management, it's up to L3 to allocate L2 buffer space. That could be fine if all bearers had a fixed amount of space required. But it's not. 802.15.4 has a variable buffer space requirement, depending on many parameters which I am not listing here, such buffer space cannot be predicted until you know exactly where the packet is meant to go. Which information is known only after IPv6 ND process. And thus forces you to update this space (it might shrink or grow).
So properly splitting L2 to L3 fixes this. Once a packet reaches L2, L3 has all the info where it is supposed to go (it ran ND already etc..). So L2 will be able to compute, once, the exact amount of buffer required and do its job. For this new buffer management, it was necessary to fix this to simplify the whole process a lot. L3 has to take care about L3 only.
1 direct relation: the compatibility of old vs new API on complete packet (received or freshly created ones for tx). Due to this ll buffer space being allocated on L3, and because then L3 did not care about it, it required to "hide" this space via net_buf_pull(). But then on l2 and in each and every drivers, it of course needed to retrieve that space (via the pkt->frags->data - net_pkt_ll_reserve(pkt). And only the first "frag" had its reserved space used relevantly, so it need to be handled specifically (some ethernet drivers where having comments describing this). And that would have been an issue for using new API net_pkt_read/net_pkt_write or else, since these do not know if you need to access this "hidden space" or not.

And 1 big and obvious reason: Optimizing buffer usage. This ll reserve thingy was thus mixed in L3, which architecturally speaking was bad already. But it was even worse in net_buf allocation mechanism: all struct net_buf had it's size shrunk by the ll_reserve amount. So if your frag size is 128bytes, used to built a big ethernet frame you would loose 14 bytes on all fragments but the first one.

Side note: the "frag" size was also, due to this weird L3/L2 mix-up tight to the bearer which had the smallest MTU. And that was again a weird thing, since buffer management is memory management: it should not have any relationship with what bearers is used or else. Theoretically you should be able to go as low as 1 byte per-net_buf and all should work just the same. (That would not make any sense in practice obviously, having a descriptor such as a struct net_buf n times bigger than the buffer it points to)

Last but not least: doing that split, offers other optimizations possibilities like in ethernet: you could have a dedicated net_buf for the ethernet header part, allocated on stack, with just 14 bytes as buffer etc... On 15.4 it will be also possible then to avoid modifying the original packet (6lo and 15.4 fragmentation were doing so before), and thus avoid - when TCP is used - to copy the whole packet in TCP for possible needed retries etc...

Also, an additional reason of doing this PR is to hide all the ugly buffer details to most of the code. We have these ugly stuff everywhere:
frag = pkt->frags;
while (frags) {
...
frag = frag->frags;
}

Instead we should not do any of it anywhere, you just write/read stuff from your packet. (and in case of network headers, you just access a header pointer directly).

jukkar

Lot of commits but fortunately most of them were small and readable.

jukkar · 2018-12-04T10:38:03Z

subsys/net/ip/net_pkt.c

typo s/reserver/reserve/ in commit "net/pkt: Let's ignore ll reserve and use 0 instead"

jukkar · 2018-12-04T10:57:27Z

samples/net/throughput_server/src/server.c

Hmm, the code above will check ll headers. I wonder if things change after ll reserve has been removed. I can try to run this sample to see if any other changes are needed.

jukkar · 2018-12-04T11:03:40Z

samples/net/ws_echo_server/src/main.c

Same with this one.

jukkar · 2018-12-04T11:08:25Z

include/net/net_pkt.h

What about making the function name a bit shorter so I propose we change this to net_pkt_alloc()

true. I had that name at first, I don't know why I went for the plain allocate word.

jukkar · 2018-12-04T11:10:17Z

include/net/net_pkt.h

Shouldn't we call this net_pkt_tx_allocate() in order to follow the same syntax as the rx allocator one?

the rx stuff is an in-between name. Idea would be to have only 1 allocator for all pkt (thus one slab, etc...). However, we will still be able to know (for stats, net mem in shell etc...) if a pkt was allocated for tx or rx, there will be a bit in net_pkt to give that info and so on (net_recv_data will set it for rx, tx will be default). That's not going to be part of that PR though,

jukkar · 2018-12-04T11:15:14Z

subsys/net/ip/net_pkt.c

what about getting max_len like this

size_t max_len = net_if_get_mtu(net_pkt_iface(pkt)); if (IS_ENABLED(CONFIG_NET_IPV6) && family == AF_INET6) { max_len = max(max_len, NET_IPV6_MTU); } else if (IS_ENABLED(CONFIG_NET_IPV4) && family == AF_INET) { max_len = max(max_len, NET_IPV4_MTU); }

then we do not need the AF_UNSPEC branch at all.

jukkar · 2018-12-04T11:24:20Z

subsys/net/ip/Kconfig

Offering this kind of option is imho dangerous, someone will misuse it anyway. How much memory we are saving with this, what does ram_report say if we enable/disable it?

for udp/ipv4 setup, even on 128bytes frag size this will be just fine option (it is dangerous at the same level as wrongly teawking frag size: user has to know what he is doing). It saves stack actually.

Sure, it probably save something. Just wondering do you have numbers about the savings.

jukkar · 2018-12-04T11:29:01Z

subsys/net/ip/ipv4.h

Why do we need to have next_header_proto parameter here and in finalize function, it looks like one should be enough.

you mean we would inform ipv4 header about the protocol only at finalize? why not indeed.

jukkar · 2018-12-04T11:39:33Z

include/net/net_pkt.h

I suggest we change the name to net_pkt_read_u8_new() in order to be consistent. Same with the other value reader helpers.

will be easier to remove the _new when necessary yes

pfalcon · 2018-12-05T11:51:50Z

@tbursztyka: I appreciate the detailed write up in #11775 (comment).

My original question re: relation to #11374 was simpler actually, to confirm that #11374 precedes this patch, instead of e.g. be superseded by this one, or a case when they largely unrelated. So, I just assume that #11374 is the next in queue, and not anything else.

tbursztyka · 2018-12-05T11:58:19Z

@pfalcon yes PR #11374 is the first in queue.

pfalcon · 2018-12-06T11:17:41Z

In one of recent commits:

c965644: net/core: Each and every received packet are bein set to overwrite

"is being"

Reworking the logic to reduce the amount of variables. Also taking the opportunity to normalize drop messages. Signed-off-by: Tomasz Bursztyka <[email protected]>

Reworking the logic to reduce the amount of variables. This part was heavier to change as it was not accessing the headers directly but instead was read parts by parts. Signed-off-by: Tomasz Bursztyka <[email protected]>

Reworking the logic to reduce the amount of variables. Introducing a generic struct to acces the common part of MLD queries, instead of accessing part by part. Also, returning NET_OK in case parsing went fine. We send an MLD report anyway, so it's not a good idea to count the message as being dropped in statistics. Signed-off-by: Tomasz Bursztyka <[email protected]>

All of these are unused anywhere now, thus can be safely removed. Signed-off-by: Tomasz Bursztyka <[email protected]>

Also, return a verdict instead of a pointer to net_pkt. It's simpler as it will be up to net_send_data()'s caller to unref the net_pkt in case of NET_DROP: less places where net_pkt can be unref. Signed-off-by: Tomasz Bursztyka <[email protected]>

Most of the code had to be reworked due to the new API: it's more logical to do everything sequentially (first headers, then MLD part) than the contrary with inserting headers at the end. Using get_data/set_data as well it makes the code clearer. Signed-off-by: Tomasz Bursztyka <[email protected]>

Now using the new API. Signed-off-by: Tomasz Bursztyka <[email protected]>

This optimizes the memory quite a bit since we do not need to clone nor split the original packet at any time. Signed-off-by: Tomasz Bursztyka <[email protected]>

Only the allocator needed to be changed here. Signed-off-by: Tomasz Bursztyka <[email protected]>

Only the allocators needed to be changed there. Signed-off-by: Tomasz Bursztyka <[email protected]>

Allocation and a minor writing logic needed to be changed. Signed-off-by: Tomasz Bursztyka <[email protected]>

This will be used by Ethernet L2 to set the right PTYPE in its header. Signed-off-by: Tomasz Bursztyka <[email protected]>

Since the rework of L2/L3 split, only L2 has access to its header. Thus up to Ethernet one to set LLDP PTYPE. Signed-off-by: Tomasz Bursztyka <[email protected]>

Minor changes: - allocator needed to be changed - and writing the lldp_pdu as well Signed-off-by: Tomasz Bursztyka <[email protected]>

These will be specifically needed in TCP, as well as being used in context internally. Signed-off-by: Tomasz Bursztyka <[email protected]>

Using new functions for net_context, ipv4 and ipv6 as well. Signed-off-by: Tomasz Bursztyka <[email protected]>

As these were parsed already by IPv4/6 input functions let's use them. Applying the change on trivial UDP usage. TCP usage will have its own commit. Signed-off-by: Tomasz Bursztyka <[email protected]>

This proovse to drastically reduce runtime overhead as it does not need to parse IP nor TCP header all over again in a lot of places. Signed-off-by: Tomasz Bursztyka <[email protected]>

Only next to be removed functions like net_tcp_set_checksum() are left untouched. All the rest is switched. Adding net_tcp_finalize() to follow the same logic as for UDP and else. Signed-off-by: Tomasz Bursztyka <[email protected]>

The test counts the '\0' of the data copied, though it does not belong to the packet. Since the test by-passes net_ipv6_input, which would update the packet length according to what is in the IPv6 header, the checksum calculation fails. Before, checksum calculation used to grab the length from IPv6 header, but that changed to use net_pkt_get_len(): that one must provide the real lenth. Signed-off-by: Tomasz Bursztyka <[email protected]>

Let's enable usage of TCP through net_context on its new sendto function. Signed-off-by: Tomasz Bursztyka <[email protected]>

Context was missing, and anyway the familiy can be grabbed from the packet. Signed-off-by: Tomasz Bursztyka <[email protected]>

This goes through basic allocation, atomic read/write, data getter/setter and cloning/copying. Signed-off-by: Tomasz Bursztyka <[email protected]>

Now that net_pkt are accessed through a common r/w API, using below a net_pkt_cursor, let's have an option that will reset this cursor once the net_pkt is freed. Result is instead of segfaulting on r/w access, these operations will bail out properly. Subsequent, and logical (unless you have a leak which is another issue) net_pkt_unref will tell you who/where the pkt was freed. Without it, you will get a segfault for instance, but that won't tell you the exact reason. This options can help you then. Signed-off-by: Tomasz Bursztyka <[email protected]>

Though these are currently used by the core only, it will be then used by net_context as well. This one of the steps to get rid of net_pkt's appdata/appdatalen attributes. Also normalizing all ip/proto parameters name to ip_hdr and proto_hdr. Signed-off-by: Tomasz Bursztyka <[email protected]>

If status is 0, both ip_hdr and proto_hdr will own a pointer to the relevant IP and Protocol headers. In order to know which of ipv4/ipv6 and udp/tcp one will need to use respectively net_pkt_family(pkt) and net_context_get_ip_proto(context). Having access to those headers directly, many callbacks will not need to parse the packet again no get the src/dst addresses or the src/dst ports. This will be change after this commit. Signed-off-by: Tomasz Bursztyka <[email protected]>

tbursztyka requested review from jukkar and rveerama1 November 30, 2018 15:42

tbursztyka requested review from finikorg, jfischer-no, loicpoulain and pfalcon as code owners November 30, 2018 15:42

tbursztyka removed request for finikorg, jfischer-no and loicpoulain November 30, 2018 15:43

tbursztyka added area: Drivers area: Networking area: API Changes to public APIs area: Ethernet labels Nov 30, 2018

tbursztyka force-pushed the net_pkt_api_changes branch from 9120458 to 880e9bb Compare December 3, 2018 09:19

jukkar requested changes Dec 4, 2018

View reviewed changes

tbursztyka force-pushed the net_pkt_api_changes branch from 880e9bb to 6c4b053 Compare December 5, 2018 13:03

tbursztyka mentioned this pull request Dec 5, 2018

Remove link layer reserve concept from network stack #11374

Merged

tbursztyka force-pushed the net_pkt_api_changes branch 3 times, most recently from db872af to c965644 Compare December 6, 2018 09:50

tbursztyka force-pushed the net_pkt_api_changes branch 3 times, most recently from 7574f71 to a372fa9 Compare December 7, 2018 10:11

Tomasz Bursztyka added 26 commits February 1, 2019 12:04

net/ipv6: Switch NS msg handler to new net_pkt API

8b26a74

Reworking the logic to reduce the amount of variables. Also taking the opportunity to normalize drop messages. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/ipv6: Switch RA msg handler to new net_pkt API

046e08c

Reworking the logic to reduce the amount of variables. This part was heavier to change as it was not accessing the headers directly but instead was read parts by parts. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/icmpv6: Remove various useless get/set hdr functions

dd976c2

All of these are unused anywhere now, thus can be safely removed. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/ipv6: Re-enable support for fragmented packet at input

fb9c76d

Now using the new API. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/ipv6: Switch sending fragmented IPv6 packet to new API

510f769

This optimizes the memory quite a bit since we do not need to clone nor split the original packet at any time. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/ipv4: Move autoconf to new net_pkt API

3dad861

Only the allocator needed to be changed here. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/arp: Switch ARP to new net_pkt API

27d464e

Only the allocators needed to be changed there. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/gptp: Switch GPTP to new net_pkt API

b580d6a

Allocation and a minor writing logic needed to be changed. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/pkt: Add a marker to identify LLDP message

dd15518

This will be used by Ethernet L2 to set the right PTYPE in its header. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/lldp: Fix where Ethernet PTYPE is set for LLDP message

74ae9c6

Since the rework of L2/L3 split, only L2 has access to its header. Thus up to Ethernet one to set LLDP PTYPE. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/lldp: Switch LLDP to new net_pkt API

22cbf3f

Minor changes: - allocator needed to be changed - and writing the lldp_pdu as well Signed-off-by: Tomasz Bursztyka <[email protected]>

net/context: Expose new functions to create ipv4/6 packet from context

f8eae38

These will be specifically needed in TCP, as well as being used in context internally. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/tcp: Switch TCP segment preparation to new net_pkt API

c775fd8

Using new functions for net_context, ipv4 and ipv6 as well. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/conn: Change callback signature to pass TCP/IP headers

7035c8a

As these were parsed already by IPv4/6 input functions let's use them. Applying the change on trivial UDP usage. TCP usage will have its own commit. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/tcp: Apply connection callback signature change

2cbf1d0

This proovse to drastically reduce runtime overhead as it does not need to parse IP nor TCP header all over again in a lot of places. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/tcp: Switch rest of TCP to new net_pkt API

0143e4c

Only next to be removed functions like net_tcp_set_checksum() are left untouched. All the rest is switched. Adding net_tcp_finalize() to follow the same logic as for UDP and else. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/context: Enable support for TCP on new sendto function

ff9d543

Let's enable usage of TCP through net_context on its new sendto function. Signed-off-by: Tomasz Bursztyka <[email protected]>

net/tcp: Fix net_tcp_print_recv/send_info macros

e0b5340

Context was missing, and anyway the familiy can be grabbed from the packet. Signed-off-by: Tomasz Bursztyka <[email protected]>

test/net: Quick sample explaining/testing the new net_pkt buffer API

4e0eba9

This goes through basic allocation, atomic read/write, data getter/setter and cloning/copying. Signed-off-by: Tomasz Bursztyka <[email protected]>

tbursztyka force-pushed the net_pkt_api_changes branch from 81ccc3b to 76fc11e Compare February 1, 2019 11:04

jukkar merged commit 4b78a25 into zephyrproject-rtos:master Feb 1, 2019

rmstoi mentioned this pull request Feb 1, 2019

net: Zephyr drops IPv4 packet with extended MAC frame size #12982

Closed

tbursztyka deleted the net_pkt_api_changes branch February 5, 2019 14:48

net_pkt buffer management and API changes #11775

net_pkt buffer management and API changes #11775

Uh oh!

Conversation

tbursztyka commented Nov 30, 2018

Uh oh!

zephyrbot commented Nov 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Nov 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pfalcon commented Dec 3, 2018

Uh oh!

tbursztyka commented Dec 3, 2018

Uh oh!

jukkar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Dec 5, 2018

Uh oh!

tbursztyka commented Dec 5, 2018

Uh oh!

pfalcon commented Dec 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zephyrbot commented Nov 30, 2018 •

edited

Loading

codecov-io commented Nov 30, 2018 •

edited

Loading