Skip to content

Conversation

@tbursztyka
Copy link
Contributor

This is in a early stage, some stuff will change in these patches still.

In order not to break anything (besides some test due to ROM/RAM being bloated for the time being), the API is made in parallel to the existing one (thus the bloat). Currently, as the patches can tell you, only part of ipv4/Ethernet are moved towards this new API. The good thing with this approach is that old API works on packet generated with the new API and vice versa so changes can be made iteratively.

@zephyrbot
Copy link

zephyrbot commented Nov 30, 2018

All checks are passing now.

Review history of this comment for details about previous failed status.
Note that some checks might have not completed yet.

@codecov-io
Copy link

codecov-io commented Nov 30, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@941f435). Click here to learn what that means.
The diff coverage is 51.62%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #11775   +/-   ##
=========================================
  Coverage          ?    54.6%           
=========================================
  Files             ?      235           
  Lines             ?    26570           
  Branches          ?     6374           
=========================================
  Hits              ?    14509           
  Misses            ?     9404           
  Partials          ?     2657
Impacted Files Coverage Δ
subsys/net/ip/net_private.h 67.74% <ø> (ø)
subsys/net/lib/dns/resolve.c 33.57% <ø> (ø)
include/net/net_ip.h 68.13% <ø> (ø)
include/net/net_context.h 78.84% <ø> (ø)
subsys/net/ip/dhcpv4.c 6.14% <0%> (ø)
subsys/net/ip/tcp_internal.h 45.71% <0%> (ø)
drivers/ethernet/eth_native_posix.c 30.3% <0%> (ø)
subsys/net/ip/net_core.c 63.87% <100%> (ø)
include/net/net_pkt.h 85.08% <100%> (ø)
subsys/net/ip/net_if.c 60.4% <100%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 941f435...76fc11e. Read the comment docs.

@pfalcon
Copy link
Contributor

pfalcon commented Dec 3, 2018

@tbursztyka, how this relates to #11374 ?

@tbursztyka
Copy link
Contributor Author

@pfalcon There are 2 relations and 1 big reason
1 indirect relation: In current buffer management, it's up to L3 to allocate L2 buffer space. That could be fine if all bearers had a fixed amount of space required. But it's not. 802.15.4 has a variable buffer space requirement, depending on many parameters which I am not listing here, such buffer space cannot be predicted until you know exactly where the packet is meant to go. Which information is known only after IPv6 ND process. And thus forces you to update this space (it might shrink or grow).
So properly splitting L2 to L3 fixes this. Once a packet reaches L2, L3 has all the info where it is supposed to go (it ran ND already etc..). So L2 will be able to compute, once, the exact amount of buffer required and do its job. For this new buffer management, it was necessary to fix this to simplify the whole process a lot. L3 has to take care about L3 only.
1 direct relation: the compatibility of old vs new API on complete packet (received or freshly created ones for tx). Due to this ll buffer space being allocated on L3, and because then L3 did not care about it, it required to "hide" this space via net_buf_pull(). But then on l2 and in each and every drivers, it of course needed to retrieve that space (via the pkt->frags->data - net_pkt_ll_reserve(pkt). And only the first "frag" had its reserved space used relevantly, so it need to be handled specifically (some ethernet drivers where having comments describing this). And that would have been an issue for using new API net_pkt_read/net_pkt_write or else, since these do not know if you need to access this "hidden space" or not.

And 1 big and obvious reason: Optimizing buffer usage. This ll reserve thingy was thus mixed in L3, which architecturally speaking was bad already. But it was even worse in net_buf allocation mechanism: all struct net_buf had it's size shrunk by the ll_reserve amount. So if your frag size is 128bytes, used to built a big ethernet frame you would loose 14 bytes on all fragments but the first one.

Side note: the "frag" size was also, due to this weird L3/L2 mix-up tight to the bearer which had the smallest MTU. And that was again a weird thing, since buffer management is memory management: it should not have any relationship with what bearers is used or else. Theoretically you should be able to go as low as 1 byte per-net_buf and all should work just the same. (That would not make any sense in practice obviously, having a descriptor such as a struct net_buf n times bigger than the buffer it points to)

Last but not least: doing that split, offers other optimizations possibilities like in ethernet: you could have a dedicated net_buf for the ethernet header part, allocated on stack, with just 14 bytes as buffer etc... On 15.4 it will be also possible then to avoid modifying the original packet (6lo and 15.4 fragmentation were doing so before), and thus avoid - when TCP is used - to copy the whole packet in TCP for possible needed retries etc...

Also, an additional reason of doing this PR is to hide all the ugly buffer details to most of the code. We have these ugly stuff everywhere:
frag = pkt->frags;
while (frags) {
...
frag = frag->frags;
}

Instead we should not do any of it anywhere, you just write/read stuff from your packet. (and in case of network headers, you just access a header pointer directly).

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of commits but fortunately most of them were small and readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo s/reserver/reserve/ in commit "net/pkt: Let's ignore ll reserve and use 0 instead"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the code above will check ll headers. I wonder if things change after ll reserve has been removed. I can try to run this sample to see if any other changes are needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making the function name a bit shorter so I propose we change this to net_pkt_alloc()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. I had that name at first, I don't know why I went for the plain allocate word.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call this net_pkt_tx_allocate() in order to follow the same syntax as the rx allocator one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rx stuff is an in-between name. Idea would be to have only 1 allocator for all pkt (thus one slab, etc...). However, we will still be able to know (for stats, net mem in shell etc...) if a pkt was allocated for tx or rx, there will be a bit in net_pkt to give that info and so on (net_recv_data will set it for rx, tx will be default). That's not going to be part of that PR though,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about getting max_len like this

	size_t max_len = net_if_get_mtu(net_pkt_iface(pkt));

	if (IS_ENABLED(CONFIG_NET_IPV6) && family == AF_INET6) {
		max_len = max(max_len, NET_IPV6_MTU);
	} else if (IS_ENABLED(CONFIG_NET_IPV4) && family == AF_INET) {
		max_len = max(max_len, NET_IPV4_MTU);
	} 

then we do not need the AF_UNSPEC branch at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offering this kind of option is imho dangerous, someone will misuse it anyway. How much memory we are saving with this, what does ram_report say if we enable/disable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for udp/ipv4 setup, even on 128bytes frag size this will be just fine option (it is dangerous at the same level as wrongly teawking frag size: user has to know what he is doing). It saves stack actually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it probably save something. Just wondering do you have numbers about the savings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have next_header_proto parameter here and in finalize function, it looks like one should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean we would inform ipv4 header about the protocol only at finalize? why not indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we change the name to net_pkt_read_u8_new() in order to be consistent. Same with the other value reader helpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be easier to remove the _new when necessary yes

@pfalcon
Copy link
Contributor

pfalcon commented Dec 5, 2018

@tbursztyka: I appreciate the detailed write up in #11775 (comment).

My original question re: relation to #11374 was simpler actually, to confirm that #11374 precedes this patch, instead of e.g. be superseded by this one, or a case when they largely unrelated. So, I just assume that #11374 is the next in queue, and not anything else.

@tbursztyka
Copy link
Contributor Author

@pfalcon yes PR #11374 is the first in queue.

@pfalcon
Copy link
Contributor

pfalcon commented Dec 6, 2018

In one of recent commits:

c965644: net/core: Each and every received packet are bein set to overwrite

"is being"

@tbursztyka tbursztyka force-pushed the net_pkt_api_changes branch 3 times, most recently from 7574f71 to a372fa9 Compare December 7, 2018 10:11
Tomasz Bursztyka added 26 commits February 1, 2019 12:04
Reworking the logic to reduce the amount of variables.

Also taking the opportunity to normalize drop messages.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Reworking the logic to reduce the amount of variables.
This part was heavier to change as it was not accessing the headers
directly but instead was read parts by parts.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Reworking the logic to reduce the amount of variables.
Introducing a generic struct to acces the common part of MLD queries,
instead of accessing part by part.

Also, returning NET_OK in case parsing went fine. We send an MLD report
anyway, so it's not a good idea to count the message as being dropped in
statistics.

Signed-off-by: Tomasz Bursztyka <[email protected]>
All of these are unused anywhere now, thus can be safely removed.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Also, return a verdict instead of a pointer to net_pkt. It's simpler as
it will be up to net_send_data()'s caller to unref the net_pkt in case
of NET_DROP: less places where net_pkt can be unref.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Most of the code had to be reworked due to the new API: it's more
logical to do everything sequentially (first headers, then MLD part)
than the contrary with inserting headers at the end.

Using get_data/set_data as well it makes the code clearer.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Now using the new API.

Signed-off-by: Tomasz Bursztyka <[email protected]>
This optimizes the memory quite a bit since we do not need to clone nor
split the original packet at any time.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Only the allocator needed to be changed here.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Only the allocators needed to be changed there.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Allocation and a minor writing logic needed to be changed.

Signed-off-by: Tomasz Bursztyka <[email protected]>
This will be used by Ethernet L2 to set the right PTYPE in its header.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Since the rework of L2/L3 split, only L2 has access to its header. Thus
up to Ethernet one to set LLDP PTYPE.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Minor changes:
- allocator needed to be changed
- and writing the lldp_pdu as well

Signed-off-by: Tomasz Bursztyka <[email protected]>
These will be specifically needed in TCP, as well as being used in
context internally.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Using new functions for net_context, ipv4 and ipv6 as well.

Signed-off-by: Tomasz Bursztyka <[email protected]>
As these were parsed already by IPv4/6 input functions let's use them.

Applying the change on trivial UDP usage. TCP usage will have its own
commit.

Signed-off-by: Tomasz Bursztyka <[email protected]>
This proovse to drastically reduce runtime overhead as it does not need
to parse IP nor TCP header all over again in a lot of places.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Only next to be removed functions like net_tcp_set_checksum() are left
untouched. All the rest is switched.

Adding net_tcp_finalize() to follow the same logic as for UDP and else.

Signed-off-by: Tomasz Bursztyka <[email protected]>
The test counts the '\0' of the data copied, though it does not belong
to the packet.

Since the test by-passes net_ipv6_input, which would update the packet
length according to what is in the IPv6 header, the checksum calculation
fails.

Before, checksum calculation used to grab the length from IPv6 header,
but that changed to use net_pkt_get_len(): that one must provide the
real lenth.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Let's enable usage of TCP through net_context on its new sendto
function.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Context was missing, and anyway the familiy can be grabbed from the
packet.

Signed-off-by: Tomasz Bursztyka <[email protected]>
This goes through basic allocation, atomic read/write, data
getter/setter and cloning/copying.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Now that net_pkt are accessed through a common r/w API, using below a
net_pkt_cursor, let's have an option that will reset this cursor once
the net_pkt is freed.

Result is instead of segfaulting on r/w access, these operations will
bail out properly. Subsequent, and logical (unless you have a leak
which is another issue) net_pkt_unref will tell you who/where the pkt
was freed. Without it, you will get a segfault for instance, but that
won't tell you the exact reason. This options can help you then.

Signed-off-by: Tomasz Bursztyka <[email protected]>
Though these are currently used by the core only, it will be then used
by net_context as well. This one of the steps to get rid of net_pkt's
appdata/appdatalen attributes.

Also normalizing all ip/proto parameters name to ip_hdr and proto_hdr.

Signed-off-by: Tomasz Bursztyka <[email protected]>
If status is 0, both ip_hdr and proto_hdr will own a pointer to the
relevant IP and Protocol headers. In order to know which of ipv4/ipv6
and udp/tcp one will need to use respectively net_pkt_family(pkt) and
net_context_get_ip_proto(context).

Having access to those headers directly, many callbacks will not need
to parse the packet again no get the src/dst addresses or the src/dst
ports. This will be change after this commit.

Signed-off-by: Tomasz Bursztyka <[email protected]>
@jukkar jukkar merged commit 4b78a25 into zephyrproject-rtos:master Feb 1, 2019
@tbursztyka tbursztyka deleted the net_pkt_api_changes branch February 5, 2019 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants