-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG/ENH: consistent gzip compression arguments #35645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
arw2019
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
might we need to update the docstring or do you think it's good as is?
|
updating the doc string is a good idea, will do that! I assume that this will affect multiple |
You could maybe add the more explicit explanation to |
|
The PR adding arguments for bz2/gzip #33398 mentioned that it affects I could make sure that all three |
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. sligthly OT, we want to add typing for the compression arg (I think we have an issue for this), similar to StorageOptions whereby we define it in pandas._typing.py
|
cc @gfyoung @WillAyd @TomAugspurger if comments. |
I will look into that, I assume it is going to be: class CompressionArgs(TypedDict, total=False):
method: str
compresslevel: Optional[int]
mtime:Optional[int]
compression:int
allowZip64:bool
strict_timestamps:booltechnically, there are a few more but users should not pass them (filename, fileobj, buffer (deprecated since python 3.0), mode).
Do you have opinions about that? compression does not only affect |
WillAyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - nice PR
|
Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-13 02:56:33 UTC |
|
oh, I didn't know that |
| except (IOError, AttributeError): | ||
| pass | ||
| for file_handle in self.file_handles: | ||
| file_handle.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably unrelated to the recent CI issues, but we should definitely close those handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is there a ResoucceWarning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen any when reading/writing json files
|
thanks @twoertwein very nice! |
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffto_csvlet's the user set all keyword arguments for gzip. Depending on whether the user provides a filename or a file object different keyword arguments can be set (gzip.openvsgzip.GzipFile).This PR always uses
gzip.GzipFile. The additional keyword arguments valid forgzip.openbut not valid forgzip.GzipFile(encoding,errors, and) are still accessible:newlinepandas/pandas/io/common.py
Line 512 in aefae55
Using
gzip.GzipFile, also allows us to setmtimeto create reproducible gzip archives.