-
Notifications
You must be signed in to change notification settings - Fork 8.2k
lib: posix: Implement generic file descriptor table #10429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10429 +/- ##
==========================================
- Coverage 53.26% 53.13% -0.13%
==========================================
Files 214 215 +1
Lines 25920 26040 +120
Branches 5715 5739 +24
==========================================
+ Hits 13806 13837 +31
- Misses 9791 9867 +76
- Partials 2323 2336 +13
Continue to review full report at Codecov.
|
125a344 to
f37936f
Compare
2741bff to
9794a1d
Compare
|
@mike-scott: So, here's the PR for generic file descriptor support which we discussed. Unfortunately, there's no conversion of sockets, because I tried to follow the way we discussed - make fdtable optional, and keep the older net-ctx-as-int support. Well, it already becomes #ifdef hell in my working copy, and not going to work without factoring out multiplexing functions for different config paths. And that's too much a change, especially that nobody clearly said they think it's a requirement. So, I'm going to make fdtable mandatory for sockets. The overhead is 8 bytes per socket, statically allocated as array. |
512f5b7 to
aeef076
Compare
|
recheck |
|
@rlubos, @GAnthony: Per above comment, we're switching sockets to the real file descriptors. Please start thinking how that affects TLS sockets and offloading. (Because I'm already swamped with unbreaking how very fragile "POSIX subsystem" works and would likely need to rework hasty "syscalls" of the userspace impl.) Of good news, this adds another offloading point, e.g. for fcntl implementation to enable/disable non-blocking mode (not sure how/if this was handled before). From the commit message: |
|
Yeah, it can become a bloat, but it can also be implemented without a lot of fuss. For instance, descriptions could be implemented as kernel objects: we already have a permission model and a way to allocate them dynamically; they would just have to be reference counted, but more on that later. File descriptors would be per thread, if they need to use them (e.g. if they have been created with pthread_create(), a space in their stack can be carved for this table) -- and it's essentially just an array of pointers to the file descrpitions themselves. Any operation on a file descriptor would do something like this (bounds-checking omitted for brevity!)... this is of course the userland side of things, the kernel side of things would be the syscalls that take in a int read(int fd, void *buf, size_t count)
{
struct file_description *desc = _current->fds[fd];
return fd_read(desc, buf, count);
}Closing a file descriptor would work like this: int close(int fd)
{
struct file_description *desc = _current->fds[fd];
int err;
if (desc == NULL) {
return errno = EBADF, -1;
}
err = fd_close(desc);
if (err < 0) {
return errno = -err, -1;
}
_current->fds[fd] = NULL;
return 0;
}Opening a file could work like this (pretty much like anything else that works with file descrpitors too): int open(const char *pathname, int flags)
{
int fd;
fd = find_fd();
if (fd < 0) {
return errno = -fd, -1;
}
_current->fds[fd] = fd_open(pathname, flags, &errno);
if (!_current->fds[fd])
return -1;
return fd;
}In this case, This would allow you to easily implement something like
int dup(int fd)
{
struct file_description *desc = _current->fds[fd];
struct file_description *new;
int fd = find_fd();
if (fd < 0) {
return errno = -fd, -1;
}
_current->fds[fd] = fd_ref(desc, &errno);
if (!_current->fds[fd]) {
return -1;
}
return fd;
}(Or just implement You've noticed Issuing a What I like about this approach is that threads that are not pthreads won't pay the price of having to maintain their own file descriptor table, and that this fits very well the permission and syscall model we already have in Zephyr. It's more bloated than your approach, sure, but there are reasons for this; if we want Zephyr to claim compatibility with POSIX, this kind of thing need to be implemented properly. |
andrewboie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lpereira analysis and would like to see this implemented; right now this is just for sockets but we want to introduce the use of file descriptors OS-wide eventually
What do you mean "this is just for sockets"?? Please see the title - this is "generic file descriptor" table, OS level, and this patch converts 2 older users of adhoc "file descriptors" - "POSIX fs" and sockets. "tty" device #10765 might be next in queue. Like, add an open() dispatcher and be able to I'm happy to hear the news that there's always something more to do around the corner. This patches fully implements the requirements which were set forth for it, and while I was developing it, I fixed a whole bunch of goofs done before me (took 80% of time and effort). So again, this should be pretty good progress for the POSIX subsys and Zephyr as a whole, and I'm sure people after me will do even greater things to it (hopefully taking care about code/RAM sizes that greatness costs). |
|
The concern I am having here is that if this gets widely adopted in the kernel with the current set of data structures, then introducing file descriptions @lpereira suggested later could involve quite a bit of work. If you can show that won't be an issue then I can lift my -1. |
subsys/net/lib/sockets/sockets.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice these fs_table methods are directly calling the Zephyr's zsock APIs.
If they instead call the BSD socket.h APIs (POSIXNAMES), then open()/read()/write() etc would also work with the new TLS and socket offload features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. So, the conceptual problem here is that we now implement our own POSIX API with POSIX names, and names like recv() now belong to POSIX API. socket.c is supposed to implement those calls. If implementation calls recv(), which it's supposed to implement, we get (conceptual) infinite loop.
I don't know if it's real or not, but at least I don't think that there should be a patch doing that right in this PR. Once that tested to work, might go as a temporary solution.
Otherwise, I told that trying to implement offloading on top of the native socket API using defines (or inlines) is cheating. The proper layering should be:
(optionally) socket -> zsock_socket() --> net_ctx
|
|-> offload
|
\-> TLS
Because now that becomes
(optionally) socket -> zsock_socket() -> fdtable -> net_ctx
And until you guys update to use fdtable, read() and write() will be out of reach for your (pseudo)sockets. And they are "pseudo" because as far as I can tell, syscalls (i.e. proper userspace vs kernel seperation and implementation) never worked for your sockets either.
I'm perplexed receiving review comments like this. Something generic like that can be said about any PR. It's also 100% true, developing project further is always quite a bit of work. So, given the generic nature of the concern, I can assure that it's never too late to make a right, well-thought change. And on contrary, it's rarely a bad thing to avoid underthought and hasty changes. My take-away from your comment is that you urge me to reply to @lpereira, and the reply "This implements the requirements set forth in #7405 [under RFC for half a year], and is complete." doesn't cut. So, let me try. |
I'm sorry, but even the intro makes me concerned. We should not implement "bloat" because we can, and there should be a fuss when this happens. Instead, we should do reasonable homework on avoiding bloat. In that regard, this patch implements tickets #7405, which was under consideration, and solicited feedback, since May.
Why suddenly?
Right, so what you see in this patch is exactly like that. As Zephyr is effectively a single-process system, "file description table" coincides with "file descriptor table", and that combined table is called "file descriptor table" in this patch, simply because "file descriptor" is a more well known term.
Sorry, I don't see this happening. As discussed with @andrewboie above, syscalls would be normal POSIX syscalls, taking the file descriptors.
It's possible to implement these things "easily" with this patch (because again, it is just what you talk about, modulo some divergences, the root of which is at this point of discussion is not clear and we'll need to discuss below).
Within the current scope, the scope of #7405, they don't need to be. As soon (well, subject to usual planning and process) as usecases for dup() and friends are provided, they will be implemented.
Here's something which hopefully both of us will like even more: no thread, be it pthread one or not, will have "to maintain their own file descriptor table". Because that has nothing to do with POSIX. I'm not sure if it's a mistake or worse, but all your comments seem to be grounded on this assumption of "per thread file descriptor tables". Such "original designs" would need to be ticketed and go thru detailed peer review first, before planned for implementation (no haste at any stage). All that is definitely out of scope of this PR. Let me summarize what this patch does implement:
(*) The current description of CONFIG_POSIX_API: "Enable mostly-standards-compliant implementations of various POSIX (IEEE 1003.1) APIs." (emphasis mine) |
include/net/socket.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bit field to me, why not use BIT() macro here for all of these defines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is bitfield in the sense that values are ORed otgether. But the values are specific numerical values, afaik coming from iBCS, (https://en.wikipedia.org/wiki/Intel_Binary_Compatibility_Standard). I don't think it's good idea to "obfuscate" them using BIT(). (If it was just some local define without any external meaning assigned, there wouldn't be any objections.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much easier to read and understand
#define ZSOCK_POLLIN BIT(0)
#define ZSOCK_POLLOUT BIT(2)
#define ZSOCK_POLLERR BIT(3)
#define ZSOCK_POLLHUP BIT(4)
etc
The actual values of these defines do not seem to matter according to http://pubs.opengroup.org/onlinepubs/009695399/basedefs/poll.h.html so there is no need to leave holes there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual values of these defines do not seem to matter according to
They don't, but we have a convention to follow values used in Linux, which in turn appears to follow iBCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to copy the values from Linux in this case and using hex numbers just looks ugly here. Using BIT() macro makes lot of sense and is more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but:
- I'm the maintainer of this subsystem, so let me kindly say that we need values compatible with Linux here (just because we need them everywhere). If you'd like to challenge that, please open a ticket titled "Let's make Zephyr deliberately incompatible with existing well-known ABIs", and let's go thru the usual discussion, etc. process on it.
- The new values added just follow the syntax used for values which are already there. Why, when in Handle IPv4 broadcast addresses #10816 (comment) , you're asked why you don't use established naming conventions, and you reply "because I just follow local naming of functions around" - the reply is "OK" (well, not sure about other reviewers, but I'm definitely ok with that, with a notice "let's fix it eventually"). Why do you keep pushing for the changes which has nothing to do with the matter of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you really suggesting that applications would define POLLIN itself to use some arbitrary binary number from Linux? Of course applications would not do that, but they would use the value that the header file specifies. And as Posix does not specify that the POLLIN should have some specific value, there is no ABI issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm just telling that when a programmer prints flags values as a number (happens all the time), looks at them in the debugger, they will see the same values in Zephyr, Linux, and hopefully in other sane OSes. Helps avoid confusion greatly.
Note that I'm not saying about being able to run Linux/etc. binaries on Zephyr (the original usecase behind iBCS). However, some other people may. E.g. @lpereira talks about "when Zephyr will have processes", I know that dynaloading is in the planning, then some guy may make a nice weekend hack showing Zephyr running a simple ELF from Linux. And you want to kill all that fun! ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to argue over this thing, we can leave the numbers here if we want. What you describe about matching linux number and zephyr number does not make much sense imho, there are lot of number handling in linux and zephyr (for similar functionality/apis) and trying to match the values where it is not strictly needed is pointless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you describe about matching linux number and zephyr number does not make much sense imho
That's because you're more of lower level networking programmer, while I'm more of higher level networking. So, I appreciate being able to look at it from my PoV, and trusting my word that it's really helpful in our time of pervasive multitasking and switching among multiple systems all the time. Thanks!
subsys/net/lib/sockets/sockets.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we follow ugly, unnatural rules of defining variables not where they are used, but possibly hundreds of lines away. I wonder, what MISRA thinks of this errorprone way.
Disagree here, it is very convenient if variables are found only in one place instead of scattered all over the function.
subsys/net/lib/sockets/sockets.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line should be removed, as we are checking variable after setting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one case is where I disagree (and I prefer to write code with much less empty line than Zephyr in general). In this case, it's not really "checking var after setting it". It's effectively a generalized "switch statement", where first case happens to be checking value of ctx. I.e. with lingo with pattern/expression matching it would be:
ctx = ...
genswitch {
case ctx == NULL:
...
case pfd->events & ZSOCK_POLLOUT:
...
case pfd->events & ZSOCK_POLLIN:
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically you have
ctx = sock_to_net_ctx(pfd->fd);
if (ctx == NULL) {
pfd->revents = ZSOCK_POLLNVAL;
ret++;
continue;
}
...
so only one if() here which is dealing with ctx variable, the other if() are irrelevant here as they are not dealing with this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line should be removed, as we are checking variable after setting it.
so only one if() here which is dealing with ctx variable
Ok, fixed.
lib/fdtable.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defines would be more proper, we should avoid using magic constants as they are just confusing when trying to read code.
lib/fdtable.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this is needed, shouldn't we have a system which "works" as expected regardless of what board we are using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOARD_NATIVE_POSIX is very, very special. Just consider what happens here: we define part of POSIX implementation, but native_posix runs on POSIX implementation underneath. And both layers leaks into one another, it's literally a minefield where step aside leads to kaboom. I did a few fixed in that regard, and there're many more to do. So, just things will always be there, just over time, we can find a way to brush it up.
lib/fdtable.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/fdtable.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather see a define used here instead of magic constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
lib/fdtable.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/posix/Kconfig
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is relation of this value and for example CONFIG_NET_MAX_CONTEXTS value. I wonder if we could tie them together somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is relation of this value and for example CONFIG_NET_MAX_CONTEXTS value. I wonder if we could tie them together somehow.
Just the same as on POSIX_MAX_OPEN_FILES below. Tie - no. Better document/cross-reference these relations - yes, over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was not really suggesting we start to tweak this now, just pondering this issue and how to solve it.
|
Pushed cosmetic fixes based on @jukkar's review. |
40c74d8 to
993afe5
Compare
|
@ramakrishnapallala: Please review, as this touches code developed by you (fs.c). |
andrewboie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for responding to @lpereira comments, I think there are further refinements we can do down the road but I'm find with merging this.
The table allows to wrap read/write (i.e. POSIX-compatible) semantics of any I/O object in POSIX-compatible fd (file descriptor) handling. Intended I/O objects include files, sockets, special devices, etc. The table table itself consists of (underlying obj*, function table*) pairs, where function table provides entries for read(), write, and generalized ioctl(), where generalized ioctl handles all other operations, up to and including closing of the underlying I/O object. Fixes: zephyrproject-rtos#7405 Signed-off-by: Paul Sokolovsky <[email protected]>
All the handling of POSIX file descriptors is now done by fdtable.c. fs.c still manages its own table of file structures of the underlying fs lib. Signed-off-by: Paul Sokolovsky <[email protected]>
read/write/etc. are defined in case CONFIG_POSIX_API is defined, and we shouldn't provide duplicates. Signed-off-by: Paul Sokolovsky <[email protected]>
So that client apps can refer to them, and then can be implemented on Zephyr side as needed. Signed-off-by: Paul Sokolovsky <[email protected]>
Previously the "socket file descriptors" were just net_context pointers cast to int. For full POSIX compatibility and support of generic operations line read/write/close/fcntl/ioctl, the real file descriptors should be supported, as implemented by fdtable mini-subsys. Socket implementation already has userspace vs flatspace dichotomy, and adding to that ptr-fds vs real-fds dichotomy (4 possible cases) is just too cumbersome. So, switch sockets to real fd's regardless if full POSIX subsystem is enabled or not. Signed-off-by: Paul Sokolovsky <[email protected]>
These tests require much bigger number than default 4. Signed-off-by: Paul Sokolovsky <[email protected]>
This is simplistic implementation which just redirects to (likewise simplistic) implementation in lib/libc/newlib/libc-hooks.c. This should be replaced with bindings to "real console", but what should be a "real console" is so far discussed, at the RFC stage. This implementation goes into the fdtable.c itself to keep all those things nicely static. (This is again likely will change when we have "real console", but again, it's so far not clear where it would belong, so at least avoid creating random files to be deleted later). Signed-off-by: Paul Sokolovsky <[email protected]>
If we don't have Newlib, the more or less POSIX library, it's unclear how to deal with POSIX stdin/stdout/stderr at all. Signed-off-by: Paul Sokolovsky <[email protected]>
|
Rebased/pushed stray empty line fix suggested by @jukkar . |
carlescufi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition, thanks @pfalcon
The table allows to wrap read/write (i.e. POSIX-compatible) semantics
of any I/O object in POSIX-compatible fd (file descriptor) handling.
Intended I/O objects include files, sockets, special devices, etc.
The table table itself consists of (underlying obj*, function table*)
pairs, where function table provides entries for read(), write, and
generalized ioctl(), where generalized ioctl handles all other operations,
up and including closing of the underlying I/O object.
Fixes: #7405
Signed-off-by: Paul Sokolovsky [email protected]