-
Notifications
You must be signed in to change notification settings - Fork 414
optimisations #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimisations #647
Conversation
Turns out it takes a surprisingly long time to evaluate paths (os.path.abspath) and this shows up in fastparquet for reading from many files.
Because cwd can change during run
Make common things first and use os.scandir in ls
| else: | ||
| t = "other" | ||
| # scandir DirEntry | ||
| out = path.stat(follow_symlinks=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a breaking change in the API? in pytorch lightning tests we're seeing this stacktrace now:
kwargs = {}
def info(self, path, **kwargs):
if isinstance(path, str):
path = self._strip_protocol(path)
out = os.stat(path, follow_symlinks=False)
link = os.path.islink(path)
if os.path.isdir(path):
t = "directory"
elif os.path.isfile(path):
t = "file"
else:
t = "other"
else:
# scandir DirEntry
> out = path.stat(follow_symlinks=False)
E TypeError: stat() got an unexpected keyword argument 'follow_symlinks'
example: https://github.com/PyTorchLightning/pytorch-lightning/runs/2766424283?check_suite_focus=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the API did not change, this path is only meant to be called by internal functions. The test is whether the path is a str, and I suspect that the tmpdir fixture used for the test is not actually a str, but something str-like. str(tempdir) in the test would solve it. However, fsspec maybe should have coped with this in the more user-facing isdir/info functions (if my guess about the cause is correct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right. tmpdir is a py.path.local object. while we can update our tests to handle with this, does it make sense to update the instance check as well?
| *[self._pipe_file(k, v, **kwargs) for k, v in path.items()] | ||
| ) | ||
|
|
||
| async def _cat_file(self, path, start=None, end=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant what's the purpose of **kwargs here? Should subclasses check that additional kwargs aren't passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so that subclasses can pass any extra arguments (like headers) without changing the signature. If the subclass doesn't have any such thing it can do, then kwargs can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without changing the signature.
What's the motivation for not changing the signature? If they subclass accepts additional kwargs, then they can add them. As long as it's a superset of the parent's signature and the function is valid with the defaults then things will work fine.
then kwargs can be ignored.
Functions probably shouldn't silently ignore kwargs, since this leads to difficult to debug problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its an indication to downstream that this is a reasonable place to add extra args, and it enables fsspec to add arguments (like start/end) without immediately braking other implementations for the common case that they are left with default values such as None.
Uh oh!
There was an error while loading. Please reload this page.