Skip to content

Conversation

@pfalcon
Copy link
Contributor

@pfalcon pfalcon commented Oct 8, 2018

The table allows to wrap read/write (i.e. POSIX-compatible) semantics
of any I/O object in POSIX-compatible fd (file descriptor) handling.
Intended I/O objects include files, sockets, special devices, etc.

The table table itself consists of (underlying obj*, function table*)
pairs, where function table provides entries for read(), write, and
generalized ioctl(), where generalized ioctl handles all other operations,
up and including closing of the underlying I/O object.

Fixes: #7405

Signed-off-by: Paul Sokolovsky [email protected]

@pfalcon pfalcon added the area: POSIX POSIX API Library label Oct 8, 2018
@codecov-io
Copy link

codecov-io commented Oct 8, 2018

Codecov Report

Merging #10429 into master will decrease coverage by 0.12%.
The diff coverage is 31.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10429      +/-   ##
==========================================
- Coverage   53.26%   53.13%   -0.13%     
==========================================
  Files         214      215       +1     
  Lines       25920    26040     +120     
  Branches     5715     5739      +24     
==========================================
+ Hits        13806    13837      +31     
- Misses       9791     9867      +76     
- Partials     2323     2336      +13
Impacted Files Coverage Δ
include/net/socket.h 100% <ø> (ø) ⬆️
lib/fdtable.c 26.08% <26.08%> (ø)
subsys/net/lib/sockets/sockets.c 38.98% <37.87%> (-2.83%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d850b3a...f0724d6. Read the comment docs.

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 16, 2018

@mike-scott: So, here's the PR for generic file descriptor support which we discussed. Unfortunately, there's no conversion of sockets, because I tried to follow the way we discussed - make fdtable optional, and keep the older net-ctx-as-int support.

Well, it already becomes #ifdef hell in my working copy, and not going to work without factoring out multiplexing functions for different config paths. And that's too much a change, especially that nobody clearly said they think it's a requirement. So, I'm going to make fdtable mandatory for sockets. The overhead is 8 bytes per socket, statically allocated as array.

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 16, 2018

recheck

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 17, 2018

@rlubos, @GAnthony: Per above comment, we're switching sockets to the real file descriptors. Please start thinking how that affects TLS sockets and offloading. (Because I'm already swamped with unbreaking how very fragile "POSIX subsystem" works and would likely need to rework hasty "syscalls" of the userspace impl.)

Of good news, this adds another offloading point, e.g. for fcntl implementation to enable/disable non-blocking mode (not sure how/if this was handled before).

From the commit message:

net: lib: sockets: Switch to use fdtable

Previously the "socket file descriptors" were just net_context
pointers cast to int. For full POSIX compatibility and support
of generic operations line read/write/close/fcntl/ioctl, the
real file descriptors should be supported, as implemented by
fdtable mini-subsys.

Socket implementation already has userspace vs flatspace dichotomy,
and adding to that ptr-fds vs real-fds dichotomy (4 possible cases)
is just too cumbersome. So, switch sockets to real fd's regardless
if full POSIX subsystem is enabled or not.

@lpereira
Copy link
Member

lpereira commented Oct 24, 2018

Yeah, it can become a bloat, but it can also be implemented without a lot of fuss. For instance, descriptions could be implemented as kernel objects: we already have a permission model and a way to allocate them dynamically; they would just have to be reference counted, but more on that later. File descriptors would be per thread, if they need to use them (e.g. if they have been created with pthread_create(), a space in their stack can be carved for this table) -- and it's essentially just an array of pointers to the file descrpitions themselves.

Any operation on a file descriptor would do something like this (bounds-checking omitted for brevity!)... this is of course the userland side of things, the kernel side of things would be the syscalls that take in a struct file_description:

int read(int fd, void *buf, size_t count)
{
    struct file_description *desc = _current->fds[fd];

    return fd_read(desc, buf, count);
}

Closing a file descriptor would work like this:

int close(int fd)
{
    struct file_description *desc = _current->fds[fd];
    int err;
 
    if (desc == NULL) {
        return errno = EBADF, -1;
    }

    err = fd_close(desc);
    if (err < 0) {
        return errno = -err, -1;
    }

    _current->fds[fd] = NULL;
    return 0;
}

Opening a file could work like this (pretty much like anything else that works with file descrpitors too):

int open(const char *pathname, int flags)
{
    int fd;

    fd = find_fd();
    if (fd < 0) {
        return errno = -fd, -1;
    }

    _current->fds[fd] = fd_open(pathname, flags, &errno);
    if (!_current->fds[fd])
        return -1;

    return fd;
}

In this case, fd_open() and fd_close() would add the calling thread to the permission bitmap, as usual.

This would allow you to easily implement something like dup() and dup2(), among other things (such as poll() -- which operates at the file description level -- or even file descriptor passing).

dup() is then just the matter of:

int dup(int fd)
{
    struct file_description *desc = _current->fds[fd];
    struct file_description *new;
    int fd = find_fd();

    if (fd < 0) {
          return errno = -fd, -1;
    }

    _current->fds[fd] = fd_ref(desc, &errno);
    if (!_current->fds[fd]) {
          return -1;
    }

    return fd;
}

(Or just implement dup2() and dup() in terms of dup2().)

You've noticed fd_ref() there: Each file description will need a reference count per thread; this is important, because you can have many file descriptors pointing to the same file description, from the same or from different threads, and using the trick of counting the bits from the permission bitmap to serve as a refcount won't work in this case.

Issuing a fd_close(desc) syscall would drop the reference from the calling thread in the descriptor; once the sum of all references drop to zero, then the VFS method of closing the description can be called, and the kernel object can be disposed of.

What I like about this approach is that threads that are not pthreads won't pay the price of having to maintain their own file descriptor table, and that this fits very well the permission and syscall model we already have in Zephyr. It's more bloated than your approach, sure, but there are reasons for this; if we want Zephyr to claim compatibility with POSIX, this kind of thing need to be implemented properly.

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lpereira analysis and would like to see this implemented; right now this is just for sockets but we want to introduce the use of file descriptors OS-wide eventually

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 24, 2018

right now this is just for sockets but we want to introduce the use of file descriptors OS-wide eventually

What do you mean "this is just for sockets"?? Please see the title - this is "generic file descriptor" table, OS level, and this patch converts 2 older users of adhoc "file descriptors" - "POSIX fs" and sockets. "tty" device #10765 might be next in queue. Like, add an open() dispatcher and be able to open("/dev/serial/UART0") and read()/write() it. This all is possible with this patch.

I'm happy to hear the news that there's always something more to do around the corner. This patches fully implements the requirements which were set forth for it, and while I was developing it, I fixed a whole bunch of goofs done before me (took 80% of time and effort). So again, this should be pretty good progress for the POSIX subsys and Zephyr as a whole, and I'm sure people after me will do even greater things to it (hopefully taking care about code/RAM sizes that greatness costs).

@andrewboie
Copy link
Contributor

The concern I am having here is that if this gets widely adopted in the kernel with the current set of data structures, then introducing file descriptions @lpereira suggested later could involve quite a bit of work. If you can show that won't be an issue then I can lift my -1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice these fs_table methods are directly calling the Zephyr's zsock APIs.

If they instead call the BSD socket.h APIs (POSIXNAMES), then open()/read()/write() etc would also work with the new TLS and socket offload features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. So, the conceptual problem here is that we now implement our own POSIX API with POSIX names, and names like recv() now belong to POSIX API. socket.c is supposed to implement those calls. If implementation calls recv(), which it's supposed to implement, we get (conceptual) infinite loop.

I don't know if it's real or not, but at least I don't think that there should be a patch doing that right in this PR. Once that tested to work, might go as a temporary solution.

Otherwise, I told that trying to implement offloading on top of the native socket API using defines (or inlines) is cheating. The proper layering should be:

(optionally) socket -> zsock_socket() --> net_ctx
                                      |
                                      |-> offload
                                      |
                                      \-> TLS

Because now that becomes

(optionally) socket -> zsock_socket() -> fdtable -> net_ctx

And until you guys update to use fdtable, read() and write() will be out of reach for your (pseudo)sockets. And they are "pseudo" because as far as I can tell, syscalls (i.e. proper userspace vs kernel seperation and implementation) never worked for your sockets either.

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 25, 2018

@andrewboie

The concern I am having here is that if this gets widely adopted in the kernel with the current set of data structures, then introducing file descriptions @lpereira suggested later could involve quite a bit of work. If you can show that won't be an issue then I can lift my -1.

I'm perplexed receiving review comments like this. Something generic like that can be said about any PR. It's also 100% true, developing project further is always quite a bit of work.

So, given the generic nature of the concern, I can assure that it's never too late to make a right, well-thought change. And on contrary, it's rarely a bad thing to avoid underthought and hasty changes.

My take-away from your comment is that you urge me to reply to @lpereira, and the reply "This implements the requirements set forth in #7405 [under RFC for half a year], and is complete." doesn't cut. So, let me try.

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 25, 2018

@lpereira :

Yeah, it can become a bloat, but it can also be implemented without a lot of fuss.

I'm sorry, but even the intro makes me concerned. We should not implement "bloat" because we can, and there should be a fuss when this happens. Instead, we should do reasonable homework on avoiding bloat. In that regard, this patch implements tickets #7405, which was under consideration, and solicited feedback, since May.

File descriptors would be per thread

Why suddenly?

Any operation on a file descriptor would do something like this

Right, so what you see in this patch is exactly like that. As Zephyr is effectively a single-process system, "file description table" coincides with "file descriptor table", and that combined table is called "file descriptor table" in this patch, simply because "file descriptor" is a more well known term.

this is of course the userland side of things, the kernel side of things would be the syscalls that take in a struct file_description:

Sorry, I don't see this happening. As discussed with @andrewboie above, syscalls would be normal POSIX syscalls, taking the file descriptors.

This would allow you to easily implement something like dup() and dup2(), among other things (such as poll() -- which operates at the file description level -- or even file descriptor passing).

It's possible to implement these things "easily" with this patch (because again, it is just what you talk about, modulo some divergences, the root of which is at this point of discussion is not clear and we'll need to discuss below).

You've noticed fd_ref() there: Each file description will need a reference count

Within the current scope, the scope of #7405, they don't need to be. As soon (well, subject to usual planning and process) as usecases for dup() and friends are provided, they will be implemented.

What I like about this approach is that threads that are not pthreads won't pay the price of having to maintain their own file descriptor table

Here's something which hopefully both of us will like even more: no thread, be it pthread one or not, will have "to maintain their own file descriptor table". Because that has nothing to do with POSIX. I'm not sure if it's a mistake or worse, but all your comments seem to be grounded on this assumption of "per thread file descriptor tables". Such "original designs" would need to be ticketed and go thru detailed peer review first, before planned for implementation (no haste at any stage). All that is definitely out of scope of this PR.

Let me summarize what this patch does implement:

  • The indirection table for users of kernel structures, representing devices, to decouple users of direct memory references to kernel structures.
  • This indirection also works across userspace/kernel split. (And works without, which is not less important!)
  • This indirection also has basic POSIX-like semantics (*).
  • This indirection is also one-level, due to the fact that Zephyr is single-process OS.
  • As soon as it acquires multi-process support (and let's please not setup another strawman here), a second level of indirection will be added to maintain per-process FD tables.

(*) The current description of CONFIG_POSIX_API: "Enable mostly-standards-compliant implementations of various POSIX (IEEE 1003.1) APIs." (emphasis mine)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bit field to me, why not use BIT() macro here for all of these defines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bitfield in the sense that values are ORed otgether. But the values are specific numerical values, afaik coming from iBCS, (https://en.wikipedia.org/wiki/Intel_Binary_Compatibility_Standard). I don't think it's good idea to "obfuscate" them using BIT(). (If it was just some local define without any external meaning assigned, there wouldn't be any objections.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much easier to read and understand

#define ZSOCK_POLLIN BIT(0)
#define ZSOCK_POLLOUT BIT(2)
#define ZSOCK_POLLERR BIT(3)
#define ZSOCK_POLLHUP BIT(4)

etc

The actual values of these defines do not seem to matter according to http://pubs.opengroup.org/onlinepubs/009695399/basedefs/poll.h.html so there is no need to leave holes there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual values of these defines do not seem to matter according to

They don't, but we have a convention to follow values used in Linux, which in turn appears to follow iBCS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to copy the values from Linux in this case and using hex numbers just looks ugly here. Using BIT() macro makes lot of sense and is more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but:

  1. I'm the maintainer of this subsystem, so let me kindly say that we need values compatible with Linux here (just because we need them everywhere). If you'd like to challenge that, please open a ticket titled "Let's make Zephyr deliberately incompatible with existing well-known ABIs", and let's go thru the usual discussion, etc. process on it.
  2. The new values added just follow the syntax used for values which are already there. Why, when in Handle IPv4 broadcast addresses #10816 (comment) , you're asked why you don't use established naming conventions, and you reply "because I just follow local naming of functions around" - the reply is "OK" (well, not sure about other reviewers, but I'm definitely ok with that, with a notice "let's fix it eventually"). Why do you keep pushing for the changes which has nothing to do with the matter of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you really suggesting that applications would define POLLIN itself to use some arbitrary binary number from Linux? Of course applications would not do that, but they would use the value that the header file specifies. And as Posix does not specify that the POLLIN should have some specific value, there is no ABI issue here.

Copy link
Contributor Author

@pfalcon pfalcon Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm just telling that when a programmer prints flags values as a number (happens all the time), looks at them in the debugger, they will see the same values in Zephyr, Linux, and hopefully in other sane OSes. Helps avoid confusion greatly.

Note that I'm not saying about being able to run Linux/etc. binaries on Zephyr (the original usecase behind iBCS). However, some other people may. E.g. @lpereira talks about "when Zephyr will have processes", I know that dynaloading is in the planning, then some guy may make a nice weekend hack showing Zephyr running a simple ELF from Linux. And you want to kill all that fun! ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to argue over this thing, we can leave the numbers here if we want. What you describe about matching linux number and zephyr number does not make much sense imho, there are lot of number handling in linux and zephyr (for similar functionality/apis) and trying to match the values where it is not strictly needed is pointless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you describe about matching linux number and zephyr number does not make much sense imho

That's because you're more of lower level networking programmer, while I'm more of higher level networking. So, I appreciate being able to look at it from my PoV, and trusting my word that it's really helpful in our time of pervasive multitasking and switching among multiple systems all the time. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we follow ugly, unnatural rules of defining variables not where they are used, but possibly hundreds of lines away. I wonder, what MISRA thinks of this errorprone way.

Disagree here, it is very convenient if variables are found only in one place instead of scattered all over the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line should be removed, as we are checking variable after setting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one case is where I disagree (and I prefer to write code with much less empty line than Zephyr in general). In this case, it's not really "checking var after setting it". It's effectively a generalized "switch statement", where first case happens to be checking value of ctx. I.e. with lingo with pattern/expression matching it would be:

ctx = ...

genswitch {
case ctx == NULL:
...

case pfd->events & ZSOCK_POLLOUT:
...

case pfd->events & ZSOCK_POLLIN:
...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically you have

ctx = sock_to_net_ctx(pfd->fd);
if (ctx == NULL) {
	pfd->revents = ZSOCK_POLLNVAL;
	ret++;
	continue;
}
...

so only one if() here which is dealing with ctx variable, the other if() are irrelevant here as they are not dealing with this variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line should be removed, as we are checking variable after setting it.
so only one if() here which is dealing with ctx variable

Ok, fixed.

lib/fdtable.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defines would be more proper, we should avoid using magic constants as they are just confusing when trying to read code.

lib/fdtable.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this is needed, shouldn't we have a system which "works" as expected regardless of what board we are using?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOARD_NATIVE_POSIX is very, very special. Just consider what happens here: we define part of POSIX implementation, but native_posix runs on POSIX implementation underneath. And both layers leaks into one another, it's literally a minefield where step aside leads to kaboom. I did a few fixed in that regard, and there're many more to do. So, just things will always be there, just over time, we can find a way to brush it up.

lib/fdtable.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lib/fdtable.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather see a define used here instead of magic constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

lib/fdtable.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is relation of this value and for example CONFIG_NET_MAX_CONTEXTS value. I wonder if we could tie them together somehow.

Copy link
Contributor Author

@pfalcon pfalcon Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is relation of this value and for example CONFIG_NET_MAX_CONTEXTS value. I wonder if we could tie them together somehow.

Just the same as on POSIX_MAX_OPEN_FILES below. Tie - no. Better document/cross-reference these relations - yes, over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was not really suggesting we start to tweak this now, just pondering this issue and how to solve it.

@SebastianBoe SebastianBoe removed their request for review October 26, 2018 07:59
@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 26, 2018

Pushed cosmetic fixes based on @jukkar's review.

@pfalcon pfalcon force-pushed the fdtable branch 2 times, most recently from 40c74d8 to 993afe5 Compare October 29, 2018 15:53
@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 29, 2018

@ramakrishnapallala: Please review, as this touches code developed by you (fs.c).

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for responding to @lpereira comments, I think there are further refinements we can do down the road but I'm find with merging this.

The table allows to wrap read/write (i.e. POSIX-compatible) semantics
of any I/O object in POSIX-compatible fd (file descriptor) handling.
Intended I/O objects include files, sockets, special devices, etc.

The table table itself consists of (underlying obj*, function table*)
pairs, where function table provides entries for read(), write, and
generalized ioctl(), where generalized ioctl handles all other
operations, up to and including closing of the underlying I/O object.

Fixes: zephyrproject-rtos#7405

Signed-off-by: Paul Sokolovsky <[email protected]>
All the handling of POSIX file descriptors is now done by fdtable.c.
fs.c still manages its own table of file structures of the underlying
fs lib.

Signed-off-by: Paul Sokolovsky <[email protected]>
read/write/etc. are defined in case CONFIG_POSIX_API is defined, and
we shouldn't provide duplicates.

Signed-off-by: Paul Sokolovsky <[email protected]>
So that client apps can refer to them, and then can be implemented on
Zephyr side as needed.

Signed-off-by: Paul Sokolovsky <[email protected]>
Previously the "socket file descriptors" were just net_context
pointers cast to int. For full POSIX compatibility and support
of generic operations line read/write/close/fcntl/ioctl, the
real file descriptors should be supported, as implemented by
fdtable mini-subsys.

Socket implementation already has userspace vs flatspace dichotomy,
and adding to that ptr-fds vs real-fds dichotomy (4 possible cases)
is just too cumbersome. So, switch sockets to real fd's regardless
if full POSIX subsystem is enabled or not.

Signed-off-by: Paul Sokolovsky <[email protected]>
These tests require much bigger number than default 4.

Signed-off-by: Paul Sokolovsky <[email protected]>
This is simplistic implementation which just redirects to (likewise
simplistic) implementation in lib/libc/newlib/libc-hooks.c. This
should be replaced with bindings to "real console", but what should
be a "real console" is so far discussed, at the RFC stage.

This implementation goes into the fdtable.c itself to keep all those
things nicely static. (This is again likely will change when we have
"real console", but again, it's so far not clear where it would
belong, so at least avoid creating random files to be deleted later).

Signed-off-by: Paul Sokolovsky <[email protected]>
If we don't have Newlib, the more or less POSIX library, it's unclear
how to deal with POSIX stdin/stdout/stderr at all.

Signed-off-by: Paul Sokolovsky <[email protected]>
@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 31, 2018

Rebased/pushed stray empty line fix suggested by @jukkar .

Copy link
Member

@carlescufi carlescufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, thanks @pfalcon

@carlescufi carlescufi merged commit 79ea613 into zephyrproject-rtos:master Nov 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: POSIX POSIX API Library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants