Skip to content

Conversation

atrosinenko
Copy link
Contributor

This pull request is based on the PR #4378 by @cynecx. I am not a JavaScript developer and I am not familiar with the Emscripten FS layer, though, the majority of PIPEFS code was already written by @cynecx and even reviewed there to some extent. The problem with that PR was the lack of tests. I have added some tests and the tests revealed some problems with the original implementation that I have fixed.

There still may be some performance issue with the (common?) use case with writing a few bytes, and then reading these bytes when implementing events, since every read() call seems to pop an almost empty 8kb buffer, and write() will allocate a new one, so It may generate some excessive garbage.

Another peculiarity of this code is that the read end of the pipe always behaves as if it is non-blocking -- is it currently possible to block the read operation at all?

assert(buffer instanceof ArrayBuffer || ArrayBuffer.isView(buffer));
var data = new Uint8Array(buffer);
data = buffer.subarray(offset, offset + length);
var data = buffer.subarray(offset, offset + length);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code (and similar code in write) looks like an incorrectly written conversion to the byte-sized buffer. Is it safe to assume that the buffer is always already byte-sized?

@atrosinenko
Copy link
Contributor Author

Unfortunately my source code comment is not shown anyway, so is it safe to assume that the buffer argument to read and write operations is always byte-sized (an Uint8Array)?

@cynecx
Copy link
Contributor

cynecx commented Feb 11, 2017

I think the input buffer type (for read) must be a type of ArrayBuffer and not somekind of view, otherwise the whole read function wouldn't actually read something as we would read into a newly created buffer.

new Uint8Array(buffer); // Where buffer instanceof ArrayBuffer => The resulting Uint8Array is a view on an existing buffer.
new Uint8Array(buffer); // Where buffer instanceof Uint8Array || ArrayBuffer.isView(buffer) => The resulting Uint8Array is a copy

I think we should change this line:

assert(buffer instanceof ArrayBuffer || ArrayBuffer.isView(buffer));

to

assert(buffer instanceof ArrayBuffer);
var data = new Uint8Array(buffer); // Create a view of an ArrayBuffer

However, if we also want to accept typed views and ArrayBuffer we should do something like:

assert(buffer instanceof ArrayBuffer || ArrayBuffer.isView(buffer));
buffer = new Uint8Array(
    ArrayBuffer.isView(buffer) ? buffer.buffer : buffer);

toRemove++;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add something like this to prevent deallocating all buffers and leave one bucket alive:

if (toRemove && toRemove == pipe.buckets.length) {
    toRemove--;
    pipe.buckets[toRemove].offset = 0;
    pipe.buckets[toRemove].roffset = 0;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cynecx, integrated this code too :)

@atrosinenko
Copy link
Contributor Author

@cynecx changing the line

assert(buffer instanceof ArrayBuffer || ArrayBuffer.isView(buffer));

to

assert(buffer instanceof ArrayBuffer);

in either read or write makes assert fail.

@atrosinenko
Copy link
Contributor Author

Meanwhile, I ran some tests and at least test_freetype (test_core.default) and test_poppler (test_core.default) are failing. But when I switch to the incoming branch (70d3702), they fail too.

@cynecx
Copy link
Contributor

cynecx commented Feb 12, 2017

@atrosinenko Could you console.log() the buffer?

@atrosinenko
Copy link
Contributor Author

When ran under Node, console.log() tries to print a lot in this test (since read and write are called many times), but when I add something like console.log(buffer.byteLength) just before asserts, it prints 16777216. I think, we should wait until @kripken's comment about FS design.

@juj juj added the filesystem label Feb 13, 2017
@@ -0,0 +1,208 @@
mergeInto(LibraryManager.library, {
$PIPEFS__postset: '__ATINIT__.push(function() { PIPEFS.root = FS.mount(PIPEFS, {}, null); });',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is actually a bit suspect: I don't think we have any type of ordering guaranteed in our JS dependencies, so calling FS.mount() here as a postset to PIPEFS might not guarantee that FS.staticInit() would have been called before. I think if it does, then it's due to 'f' < 'p' alphabetically by accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it may be worth looking at sockfs -- it seems to have the same dependency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good observation.

Reading closer, it looks like FS.init.initialized is not needed to be true in order to be able to FS.mount() (FS does that itself in FS.staticInit() as well) However FS.staticInit() must have been called before FS.mount() can. I think if this does not hold, then errors will naturally manifest by the absence of FS.nameTable, so things should be good here.

$PIPEFS: {
BUCKET_BUFFER_SIZE: 1024 * 8, // 8KiB Buffer
mount: function (mount) {
return FS.createNode(null, '/', {{{ cDefine('S_IFDIR') }}} | 511 /* 0777 */, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks odd, it's mounting the root directory of the whole filesystem to be PIPEFS - does this replace MEMFS altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the same code is used in sockfs -- is it currently supported?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOCKFS implements BSD sockets support via a WebSockets backend, it is currently supported. Reading closer, this is not mounting PIPEFS to the root of the filesystem, this is calling FS.createNode (and not FS.mount) to create the filesystem root directory. This should be redundant, the root directory should exist, and this actually would create multiple versions of that node. Does PIPEFS.mount get called from anywhere? It can be removed if not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIPEFS.mount is called from $PIPEFS__postset through FS.mount. But it looks like this createNode creates kind of pseudo-root directory for PIPEFS not linked to the FS.root. Maybe it is OK?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, missed that line. Yeah, that is a bit odd since it creates multiple nodes that have the identical path /. Would instead here return FS.lookupPath('/'); work instead of return FS.createNode(...);? That would avoid creating duplicate root nodes. If it doesn't work for some reason, then that's ok if we document this peculiar behavior to understand why. (I know we do have the same proprty in the existing SOCKFS, would be good to figure that out too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, that it creates identical root nodes: when null is passed as parent to createNode, as far as I understand, it creates a node that is parent of itself. And since all nodes have sequential inode numbers, then this pseudo-root node should have different parent inode compared to the "real" root directory. Looks like the only way new node is registered in createNode is through FS.hashAddNode(...) and all lookups through FS.nameTable are performed with comparison by name and parent id. Though, it seems that any call to FS.unmount will remove nodes, that are not linked to FS.root. Meanwhile FS.mount has special treatment for pseudo-mounts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I probably misunderstood FS.unmount() -- it should not remove such pseudo mounts.


pipe.buckets.forEach(function (bucket) {
currentLength += bucket.offset - bucket.roffset;
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is a comment below about optimising to not generate extra garbage. In case that is interesting, this forEach construct also generates temporary garbage by creating an iterator object and a temporary function that gets nuked afterwards. The second iteration over the buckets below is using a for (var i = 0; i < pipe.buckets.length; i++) { construct. That is a nice and garbage-free way to iterate over an array.


while (toRemove--) {
pipe.buckets.shift();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A naive implementation of .shift() in a browser could be of linear complexity, so this while loop can end up being quadratic complexity. Calling pipe.buckets.splice(0, toRemove); would achieve the same with better performance?

data = data.subarray(freeBytesInCurrBuffer, data.byteLength);
}

var numBuckets = ~~(data.byteLength / PIPEFS.BUCKET_BUFFER_SIZE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see ~~ to truncate down to integer. Generally x|0 is used throughout the codebase to achieve this, though I suppose these end up being functionally identical?

'library_fs.js',
'library_memfs.js',
'library_tty.js',
'library_pipefs.js',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I presume the fs < pipefs order does end up coming from here. In that case, probably worth to add a note here that pipefs static init depends on fs static init being called before, so the ordering in this list is fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, moving library_pipefs.js above the library_fs.js here seems not to change the order in which postsets are emitted in the resulting js. Reversing __deps to $FS -> $PIPEFS seems not to change it too. So it works, but no one know how. :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely that these names go to a dictionary later on, so the alphabetical ordering will be restored later even if they are not present now. However I think no need to worry if it works, since issues of not having initialized the filesystem should be apparent if that happens. (For curiosity, you could try renaming the file to library_aipefs.js or something similar and see if that causes issues)

puts("success");

return 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work with the test! Btw, in the upcoming ASMFS filesystem, these kinds of tests are super valuable because they help make sure that the constructs will be multithreading capable.

@atrosinenko
Copy link
Contributor Author

Fixed some review issues as suggested.

@atrosinenko
Copy link
Contributor Author

What further actions are expected from me on this PR?

@kripken
Copy link
Member

kripken commented Mar 6, 2017

(@juj has been at a conference, GDC, but hopefully he'll have time to reply soon)

@atrosinenko
Copy link
Contributor Author

Excuse me for the delay. @juj , what further actions are expected from me on this PR?

I have run random500 tests and it found an error in test_unistd_misc. It should definitely be adjusted to reflect presence of pipe syscall, but it tests how does pipe(0) work too. It should be possible to return EFAULT in that particular case, but does it test something really useful, since NULL is not the only possible invalid pointer?

@atrosinenko
Copy link
Contributor Author

@kripken @juj What should I fix in this PR?

@atrosinenko
Copy link
Contributor Author

Hello? Does anybody hear me? :)

@juj
Copy link
Collaborator

juj commented Jun 6, 2017

Very sorry for the delay. This does look good, see the question about the mount function.

Also looking at the test/unistd/misc.c test, the expected results file test/unistd/misc.out assumes that pipes are always failing, i.e.:

pipe(good): -1, errno: 38
pipe(bad): -1, errno: 38

We should update that test to reflect the new implementation status, should it be as follows?

pipe(good): 0, errno: 0
pipe(bad): -1, errno: 38

@atrosinenko
Copy link
Contributor Author

Considering this test, I think we may either ignore NULL, remove the second line and consider such cases as Undefined Behavior, but it may be worth to really return error in this particular case to facilitate debug. Maybe it is even required by some standard...

I hope I prepare some sane version of the PR this weekend.

Make poll() behave differently for the read end and write end of the pipe.
Remove constructing of `Uint8Array`s that are disposed immediately.
Suppose buffer is always byte-sized.
Fix the `read` operation so it returns EAGAIN if there is no data to read as
if the read end of a pipe is always in a non-blocking mode.
The current PIPEFS `read()` implementation seems to produce excessive garbage
with use cases such as write several bytes, read everything, write, read...,
since it everytime throws away an almost empty 8Kb buffer.

This commit addes a fix proposed by @cynecx.
@atrosinenko
Copy link
Contributor Author

Rebased PR onto fresh incoming and fixed test_unistd_misc. Now it passes all the tests that are run on my installation with python tests/runner.py. See my reply on PIPEFS.mount. Meanwhile, is it OK to suppose buffers passed to read and write are always byte-sized?

@atrosinenko
Copy link
Contributor Author

I have added a comment clarifying my current view of situation with PIPEFS.mount(...).

I tested the following program:

#include <stdio.h>
#include <assert.h>
#include <sys/types.h>
#include <dirent.h>
#include <unistd.h>
#include <errno.h>

void ls()
{
  DIR *dir = opendir("/");
  struct dirent *ent;
  int total = 0;

  assert(dir != NULL);
  printf("####\n");
  while(errno = 0, ent = readdir(dir))
  {
    printf("%s\n", ent->d_name);
	total++;
  }
  assert(errno == 0);
  printf("Total: %d\n\n", total);
  assert(closedir(dir) == 0);
}

int main()
{
  ls();
  fopen("/test.txt", "w");
  ls();
  int fd[2];
  assert(pipe(fd) == 0);
  ls();
}

And when ran with emcc test.c -o test.html --preload-file test.c --emrun && emrun --browser firefox test.html it prints

pre-main prep time: 28 ms
####
test.c
proc
dev
home
tmp
..
.
Total: 7

####
test.txt
test.c
proc
dev
home
tmp
..
.
Total: 8

####
test.txt
test.c
proc
dev
home
tmp
..
.
Total: 8

that looks good.

assert(buffer instanceof ArrayBuffer || ArrayBuffer.isView(buffer));
var data = buffer.subarray(offset, offset + length);

if(length <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if

var fdPtr = SYSCALLS.get();

if (fdPtr == 0) {
throw new FS.ErrnoError(ERRNO_CODES.EFAULT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be only 2 spaces

@kripken
Copy link
Member

kripken commented Jun 26, 2017

Aside from those tiny style comments, this looks good to me, nice work. @juj had some comments earlier, though, so let's give him a little more time to look at this before merging.

@atrosinenko
Copy link
Contributor Author

Fixed those two style issues, thanks.

@atrosinenko
Copy link
Contributor Author

Ping :) Meanwhile, can we safely assume that buffers passed to read and write are always byte-sized?

@juj
Copy link
Collaborator

juj commented Jul 18, 2017

Thanks for the update. This looks good to merge to me now.

@kripken kripken merged commit 8fce4a2 into emscripten-core:incoming Jul 19, 2017
@atrosinenko atrosinenko deleted the pipefs branch July 20, 2017 09:16
@atrosinenko
Copy link
Contributor Author

Thank you very much everyone who participated, we ultimately did it. :)

@juj
Copy link
Collaborator

juj commented Jul 21, 2017

Thanks @atrosinenko for connecting to the various conversations on the bug tracker on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants