Skip to content

Conversation

@mike-scott
Copy link
Contributor

@mike-scott mike-scott commented Sep 6, 2018

This is a WIP PR so that some key stake holders can discuss if this approach is feasible.

After attempting to convert the LwM2M subsys lib to socket API, I realized that many devices don't have the hardware resources necessary to run in a secure state. Previously, this was possible under the net-app APIs. To that end, I don't believe we can make a direct cut over and should support net-app APIs until a better solution is presented.

Goals:

  • Keep support for net-app while adding support for socket API via Kconfig option (socket API could be the default if desired)

Discussion / Thought process:

  • net_buf provides a well supported method for building protocol data that needs buffers which can be appended / inserted to. They can be configured to use a data pool in a memory efficient way.
  • net_pkt APIs have several functions which operate primarily on the buffer fragments contained in the packet. Several of these would be useful to have in the net_buf APIs, but a net_buf_context structure is needed for some of them (read as "insert) to hold information such as a buffer allocator function, etc. (NOTE: I'm currently using "appendable_net_buf" as the name of the context structure... open to ideas here.)
  • Once we have the needed net_buf APIs supported (insert, etc), we can drop net_pkt specific behavior in the CoAP lib and the backend formatter functions in the LwM2M lib and instead only use net_buf_context(s).
  • For a subsys like LwM2M where the actual connection is managed, we can add a network layer API abstraction, so that both socket and net-app can live together in harmony.
  • Other subsys like MQTT and HTTP could also use this method.
  • Later when update BSD sockets implementation to use optimized networking buffers #7590 (sockets to use net_bufs) the protocols will already be in a state to support it.

WHAT WORKS:

  • Select LwM2M network layer via the following Kconfig:
    LWM2M_NET_LAYER_NET_APP=y
    or
    CONFIG_LWM2M_NET_LAYER_SOCKET=y
  • Tested samples/net/lwm2m on qemu_x86 target and swapped between net-app and socket layers using the overlay-samples.conf.

Here are some size comparisons of the LwM2M client sample built for BLENano2 (nRF52832)

[NET-APP]:
Current master branch (w/o PR):
Memory region         Used Size  Region Size  %age Used
           FLASH:      198060 B       512 KB     37.78%
            SRAM:       43908 B        64 KB     67.00%
        IDT_LIST:         136 B         2 KB      6.64%

NEW (w/ PR):
Memory region         Used Size  Region Size  %age Used
           FLASH:      197960 B       512 KB     37.76%
            SRAM:       44228 B        64 KB     67.49%
        IDT_LIST:         136 B         2 KB      6.64%

[SOCKETS]
Memory region         Used Size  Region Size  %age Used
           FLASH:      206092 B       512 KB     39.31%
            SRAM:       50436 B        64 KB     76.96%
        IDT_LIST:         136 B         2 KB      6.64%

TODO:

  • Add socket DNS support
  • Add socket DTLS support
  • Split this up into normal patches for actual submission
  • Simplify tests/net/lib/coap by removing net_pkt usage and IP headers

Signed-off-by: Michael Scott [email protected]

@mike-scott
Copy link
Contributor Author

@GAnthony Would it be possible to test the LwM2M client sample on your HW which requires the socket APIs to see if this PR is headed in the right direction?

@mike-scott
Copy link
Contributor Author

@nashif @jukkar @tbursztyka @laperie @pfalcon Please let me know if this is a direction we can go for socket API support. If so, I'll continue to break this down into acceptable upstream patches.

@mike-scott mike-scott added RFC Request For Comments: want input from the community area: Networking labels Sep 6, 2018
@mike-scott mike-scott added this to the v1.14.0 milestone Sep 6, 2018
@mike-scott mike-scott self-assigned this Sep 6, 2018
@mike-scott
Copy link
Contributor Author

Updated all the samples / tests I could find using CoAP APIs. Let's check sanity now.

@rveerama1
Copy link
Contributor

@mike-scott I am also working on CoAP over sockets support from past couple of days. I did't know that you went very far. My approach is quite different than yours. I just modified only net_pkt related stuff from CoAP library. But your approach is changing a lot in net_buf, which is not required IMO. My changes are at initial stage. I will create WIP/RFC PR by EOD.

@rlubos
Copy link
Contributor

rlubos commented Sep 7, 2018

I've briefly glanced through the changes, and without diving into details I have one general remark/question. Is it still a good approach to use net_buf/net_pkt to create CoAP/LWM2M messages as we aim for socket API now? For sockets, it brings additional overhead, as we need to lineraze the data, just to have it packed into net_bufs again at socket layer.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 7, 2018

Is it still a good approach to use net_buf/net_pkt to create CoAP/LWM2M messages as we aim for socket API now?

It's definitely not. A litmus test for doing it right would be that the development actually happens on Linux, as more comfortable, and then everything just works on both Zephyr and Linux. (That's how for example #5985 was done.)

@rveerama1
Copy link
Contributor

@mike-scott This is my branch (https://github.com/rveerama1/zephyr/commits/coap_sock) and commit is rveerama1@9f0647e. This is exactly same as current CoAP without net_pkt stuff in CoAP library. As I said just started few days back, it's work in progress. coap-client sampel don't even compile :). Just for reference about my approach.

@mike-scott
Copy link
Contributor Author

@rveerama1 Your method seems to add a 2nd CoAP library for socket support? Are we sure we want to maintain and test 2 libs? This seems prone to bit rot.

@jukkar
Copy link
Member

jukkar commented Sep 7, 2018

Are we sure we want to maintain and test 2 libs?

I think the plan is to deprecate the older net_buf based one. The grand master plan is to provide only socket based interfaces to applications. Internally net_buf's will be used later too.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 7, 2018

Internally net_buf's will be used later too.

But those may be reasonably different net_buf's after #7578. Churn is everywhere! Sockets are the only safe heaven ;-).

@mike-scott
Copy link
Contributor Author

mike-scott commented Sep 7, 2018

@rveerama1 : Did you compare the RAM / flash resource usage before / after your PR?

@rlubos

I've briefly glanced through the changes, and without diving into details I have one general remark/question. Is it still a good approach to use net_buf/net_pkt to create CoAP/LWM2M messages as we aim for socket API now? For sockets, it brings additional overhead, as we need to lineraze the data, just to have it packed into net_bufs again at socket layer.

To clarify: The PR moves net_pkt usage into net-app only code.
So we're discussing net_buf usage. One could say that using flat buffers will actually require even more resources than using net_buf. Also, net_buf isn't going anywhere. It will be used in the future as it supports read / write / append / insert APIs in a nice dynamic way.

@pfalcon

It's definitely not. A litmus test for doing it right would be that the development actually happens on Linux, as more comfortable, and then everything just works on both Zephyr and Linux. (That's how for example #5985 was done.)

I think the keyword you said is "comfortable" and not "doing it right". The reality is: it won't be like Linux programming. Zephyr is an RTOS aimed at small HW. Have you seen how careless Linux programmers are with resources? While it might provide a general direction, your statement is filled with unicorns and fairy tales.

There is a very concerning trend of heavier resource usage currently happening in Zephyr. I hope we keep in mind that currently you can run a bluetooth stack, IP-stack, and an LwM2M client using DTLS all on HW with 64K RAM and 512K flash in a way where you can have room for 2 copies of the application in order to perform OTA updates in a secure manner.

Very quickly we wont:

  • Sockets currently require flat buffers. LwM2M and other protocols may have a hard time estimating buffer sizes which present a growth problem.
  • Sockets currently require NewLibC
  • Logger is being introduced which currently has an unknown effect on RAM / flash usage
  • Userspace support adds additional overhead

There has to be a way that we can still support the "simple" RTOS use-cases.

I illustrated the resource jump in my PR description and I chose a fairly conservative approach for sharing multi-use net_buf behind the scenes (many packets for instance are often less than 128 bytes). If I was to only using max MTU flat buffers the RAM usage would be even higher. Ignoring the fact that we currently have no read / write / append / insert APIs for flat buffers which would almost certainly need to be developed.

If you look at companies introducing products into the market, they are already looking for ways to cut cost. The last thing they are going to do is "buy bigger" so they can use Zephyr. They'll just go elsewhere.

@jukkar

I think the plan is to deprecate the older net_buf based one. The grand master plan is to provide only socket based interfaces to applications. Internally net_buf's will be used later too.

I think you meant to say net-app APIs are being depreciated, and that net_buf is being optimized/redesigned. This PR moves away from being net_pkt dependent and stays with net_bufs which will continue to be used in the future.

There is confusion between "applications" and "internal network library code" atm that probably needs to be ironed out.

@pfalcon

But those may be reasonably different net_buf's after #7578. Churn is everywhere! Sockets are the only safe heaven ;-).

Not so fast: #7590

@mike-scott
Copy link
Contributor Author

@rveerama1
I'm not sure your approach is going to work well with:

/* CoAP Message */
static u8_t message[MAX_COAP_MSG_LEN];
static u8_t response[MAX_COAP_MSG_LEN];

This means the CoAP APIs are not thread safe?

@rveerama1
Copy link
Contributor

rveerama1 commented Sep 10, 2018

@rveerama1 : Did you compare the RAM / flash resource usage before / after your PR?

Like I mentioned already, sample doesn't even compile. Changes are at early stage to compare those details.

@jukkar
Copy link
Member

jukkar commented Sep 10, 2018

This means the CoAP APIs are not thread safe?

Note that those static buffers are only in the sample application, the library itself is just using what ever buffer that is given to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste errors/confusion (?), missing opportunity to improve the API. E.g.:

offset Offset of input buffer

which "input buffer" and "offset of"??

pos Pointer to position of offset

Position of offset? What's that, in plain words?

Why these are still 2 params, if at least 95% of the code has those 2 params as pos, &pos?

after reading 2 bytes

Not 2, 1 here.

value Value is returned

Value is returned?

@pfalcon
Copy link
Contributor

pfalcon commented Sep 10, 2018

@mike-scott re: #9832 (comment)

I don't know where to start ;-). Let go quick over some statements.

Have you seen how careless Linux programmers are with resources?

Yup ;-). But that doesn't mean that POSIX API is resource-bloat - it's carefully thought out and designed to balance well overheads, usability, and underlying technology abstraction.

There is a very concerning trend of heavier resource usage currently happening in Zephyr.

We can even have a case study why ;-). So, some time ago one programmer decided to add a framework for "modem drivers". It went as low-level as handling IRQs from UART. He was pointed that there's already an API in Zephyr which is very close to his needs, and abstracts from the need to deal with UART IRQ, but he decided to duplicate functionality anyway. So, any wonder that Zephyr code size bloats up, and RAM of course too, because nobody works on optimization, everyone just adds their "original designs".

P.S. Just had a look at your code again and paid attention that you put to a k_pipe from ISR. But see a Note at the end of http://docs.zephyrproject.org/latest/kernel/data_passing/pipes.html#concepts section.

on HW with 64K RAM and 512K flash

That's too bloated IMHO, we should target for much smaller ;-).

Sockets currently require flat buffers.

Socket API forever requires flat buffers. It's one of the biggest features. (No, it's not - it's just the obvious, baseline requirement.)

LwM2M and other protocols may have a hard time estimating buffer sizes which present a growth problem.

Really? I thought that LwM2M is UDP based, so you can't send more than MTU at once.

Sockets currently require NewLibC

Let's fix it if you're concerned.

Logger is being introduced which currently has an unknown effect on RAM / flash usage
Userspace support adds additional overhead

I'm with you on these ;-)

I illustrated the resource jump in my PR description and I chose a fairly conservative approach for sharing multi-use net_buf behind the scenes

I see it a bit differently - as if you can't let those net_buf's go, and thus it precludes you to start from clean page and implement new exciting optimizations ;-).

Just as examples:

(many packets for instance are often less than 128 bytes). If I was to only using max MTU flat buffers the RAM usage would be even higher.

Keep max MTU buffers only for LwM2M which can go to MTU. For less than 128 bytes messages, keep buffers not larger than 128 bytes.

we currently have no read / write / append / insert APIs for flat buffers which would almost certainly need to be developed.

*ptr++ = val;

You can go up from there. Yeah, just imagine - you e.g. can make a message overflow check once, and if it passes, just write data directly, without any error checking. Compare that to calling 100 times some function with gazillion of params, result of each call requiring a check. Do you see where bloat comes from?

If you look at companies introducing products into the market, they are already looking for ways to cut cost. The last thing they are going to do is "buy bigger" so they can use Zephyr. They'll just go elsewhere.

Yeah, they go elsewhere, to RTOS which offer well-known, time-proven APIs. That's the way to cut cost - on R&D and support/maintenance. As for BOM, it doesn't have to be bigger. Again, compare direct pointer access vs calling some godawful functions with error checking.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 10, 2018

@mike-scott : And now some general comments.

It's a big achievement that we all came to conclusion that BSD Sockets API is a way forward in Zephyr. At the same time, there should be absolutely no worries about native (net_context-based) API and its wrappers like net_app. They aren't going anywhere until sockets-based solutions are proven to be at least not worse.

With that in mind, I would imagine that someone starting to work with sockets, does that out of excitement of using well-known API and simplifying the design that offers. However, with your code - and that's just my personal opinion - I don't see that approach. Instead I see an attempt to marry approach from old adhoc API with sockets. While one of the possibilities, I (again, personally), don't see a point in doing it like that - sockets allow to exercise new approaches, and old net_app doesn't go away. I may be wrong. After all, you know LwM2M and its architecture, in Zephyr and overall, so maybe it makes sense.

But note that trying to marry native API and sockets, you already introduce new entities like struct appendable_net_buf, so you're already doing a bunch of refactoring. But maybe refactoring for canonical sockets would make more sense?

Anyway, let me challenge myself in these conclusions in try to advocate your approach - in the next message.

@pfalcon
Copy link
Contributor

pfalcon commented Sep 10, 2018

@mike-scott, So, summing up, I "reject" approach in this PR, in a sense that if you tell me that's the right approach, I don't believe you ;-). Note that I may be wrong.

But let me try to find independent, external references for such an approach which could have some similarity with approach used here.

Some time ago I stumbled upon this doc https://github.com/sustrik/bsd-socket-revamp/blob/master/source.txt from Martin Sustrik, a guy behind ZeroMQ and its rewrite nanomsg. The title is clickbait, of course it's not about "revamping" BSD Sockets API, it's perfect as it is. It's about designing new, "modern" networking API supposed to be of the same level as BSD Sockets API.

So, as you can see, he advocates usage of linked list of buffers for network I/O operations: https://github.com/sustrik/bsd-socket-revamp/blob/master/source.txt#L216 . Bingo!

Of course, there're big differences too. First of all, no talking about fixed-size buffers. So, @tbursztyka's refactor is what will be faithful implementation if this idea in Zephyr. Back to Sustrik's API, there's of course no "compact" function (which is hilarious one anyway), nor net_buf_read_u8 and friends with 4 params and return value which can signify error - such of course can be implemented, though many apps will get along without them. Nor there's net_buf_insert_bytes(), because presence of such a function signifies a design problem (protocol messages are constructed from beginning to end), but again, with linked lists it's possible and trivial.

So, much approach towards this? It's simple - we start with plain ol' good BSD Sockets and implement everything in their native manner. Then maybe we implement novelties like Sustrik's API - especially if he really submits it as an IETF RFC, etc. Implement on userspace level I mean. Because on kernel space we'll have it with @tbursztyka's refactor.

@mike-scott
Copy link
Contributor Author

@pfalcon

Nor there's net_buf_insert_bytes(), because presence of such a function signifies a design problem (protocol messages are constructed from beginning to end),

I do as a matter of fact need the ability to insert bytes into whatever memory structure is used for building the LwM2M protocol payload. Spoiler alert: it's not a design flaw. One example is how TLV formatting works, in the OMA TLV spec for LwM2M. There are "header" object entries such as "object instance" entry. It lands before a series of "resource" entries. This "object instance" entry includes the total length of the following resource data entries and that length value can be 1 or 2 bytes depending on how long it needs to be. Without literally doing everything twice, that needs to be inserted after the resource data is added and the length is known:
https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/net/lib/lwm2m/lwm2m_rw_oma_tlv.c#L130

As part of the migration from net_app APIs to socket APIs, let's
stop referencing the net_pkt fragments throughout the LwM2M library.

Establish a msg_data flat buffer inside lwm2m_message and use that
instead.

NOTE: As a part of this change we remove the COAP_NET_PKT setting.
The COAP library reverts to COAP_SOCK behavior.

This doesn't mean we use sockets in LwM2M (yet), it only means we
use the socket-compatible COAP library which parses flat buffers
instead of net_pkt fragments.

Signed-off-by: Michael Scott <[email protected]>
net_app contexts save the remote address and we use this during
observe notifications and pending handling.  If we move to another
network layer such as sockets, then the remote address becomes
harder to reference.  Let's save it as a part of the client
context.

Signed-off-by: Michael Scott <[email protected]>
The JSON formatter is currently not enabled for incoming WRITE
operations.  To update the code in the formatter and not litter
the input context with extra data, let's allow formatters to
store their own user data.

Signed-off-by: Michael Scott <[email protected]>
Update the parsing functions for JSON used by the JSON data
formatter and enable it in the LwM2M engine.

Signed-off-by: Michael Scott <[email protected]>
For bootstrap support, we need to store connection credentials
in the security object.  This way the client can start a connection
at index 0 and after bootstrapping, move to the next connection.

Let's add the needed fields and a config item to set the key length.

Signed-off-by: Michael Scott <[email protected]>
In order to support bootstrap mode, we need to store server data
in the security / server objects.  Once the connection to the
bootstrap server is made, it will clear these objects and add
new server connection data.

Signed-off-by: Michael Scott <[email protected]>
@mike-scott
Copy link
Contributor Author

Rebased on todays master branch and removed merged patches

@jukkar
Copy link
Member

jukkar commented Jan 31, 2019

ping @dbkinder please re-review the changes

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the idea that this needs to get into 1.14 anyway.

@mike-scott mike-scott added the dev-review To be discussed in dev-review meeting label Jan 31, 2019
Copy link
Contributor Author

@mike-scott mike-scott Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot I had added some stub DNS_RESOLVER code. This doesn't work. I'll remove the stub code from the socket addition patch and then add a new patch which implements DNS_RESOLVER.

@carlescufi carlescufi dismissed dbkinder’s stale review January 31, 2019 16:41

Not applicable anymore due to PR split

@pabigot pabigot removed the dev-review To be discussed in dev-review meeting label Jan 31, 2019
Now that the security data can be loaded into and used from the
security / server objects, we can add support for LwM2M bootstrap.

This is a mode where initially a connection can be made to a server
which can update several LwM2M (including security and server
data) and then trigger a "bootstrap complete".  Once this happens
the client will start it's connection process over but now with
the new information.

Signed-off-by: Michael Scott <[email protected]>
We can save some resources by removing the periodic service thread
and replacing it by queuing the services to the work queue.

Before (reel_board using BT + DTLS)
Memory region         Used Size  Region Size  %age Used
           FLASH:      289464 B         1 MB     27.61%
            SRAM:       75620 B       256 KB     28.85%
        IDT_LIST:         136 B         2 KB      6.64%

After
Memory region         Used Size  Region Size  %age Used
           FLASH:      289576 B         1 MB     27.62%
            SRAM:       74596 B       256 KB     28.46%
        IDT_LIST:         136 B         2 KB      6.64%

Signed-off-by: Michael Scott <[email protected]>
This commit resets the firmware status to IDLE after a bad
download attempt.  Previously, the firmware object would stay
in an odd state and any further attempts to download firmware
would return an error.

Signed-off-by: Michael Scott <[email protected]>
@mike-scott
Copy link
Contributor Author

Updated the commit with the following:

  • Quite a few bug-fixes centered around cleaning up the sockets / connections when things go wrong
  • Added DNS_RESOLVER support which uses socket-based getaddrinfo call
  • Cleaned up IPv6 host parsing (sample adds brackets around IPs with colons in them)
  • Use the same URL parser logic for firmware downloads (deleted the duplicate code)
  • Removed a bunch of dead fields from the RD client structure

If we need to unblock the other net_app removal PRs, this can be merged now and I'll continue to run tests and shake out bugs over the next 2 weeks.

@mike-scott
Copy link
Contributor Author

For reference: bootstrap support, new JSON write operation parsing and the buf utilities make up most of the code additions (+600ish lines)

@mike-scott
Copy link
Contributor Author

Pushed a small bugfix for proxy firmware download

This commit removes the net_app layer from the LwM2M library and
replaces it with BSD-sockets APIs.

Signed-off-by: Michael Scott <[email protected]>
The LwM2M library has moved from the network application library
APIs to BSD socket APIs.  Let's make the needed changes in the
LwM2M sample to follow those changes.

Signed-off-by: Michael Scott <[email protected]>
Previously, the net_app layer handled DNS support as a part of
network initialization.  With the move to BSD-socket APIs,
we need to add support for DNS to the LwM2M library.

Signed-off-by: Michael Scott <[email protected]>
@mike-scott
Copy link
Contributor Author

Re-pushed to fix a line over 80 char warning.

@nashif nashif merged commit a79087c into zephyrproject-rtos:master Feb 1, 2019
@mike-scott mike-scott deleted the master-lwm2m-socket branch February 1, 2019 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants