- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
ENH: Allow compression in NDFrame.to_csv to be a dict with optional arguments (#26023) #26024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
  
     Merged
                    Changes from all commits
      Commits
    
    
            Show all changes
          
          
            41 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      4e73dc4
              
                ENH/BUG: Add arcname to to_csv for ZIP compressed csv filename (#26023)
              
              
                drew-heenan ab7620d
              
                DOC: Updated docs for arcname in NDFrame.to_csv (#26023)
              
              
                drew-heenan 2e782f9
              
                conform to line length limit
              
              
                drew-heenan 83e8834
              
                Fixed test_to_csv_zip_arcname for Windows paths
              
              
                drew-heenan d238878
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                drew-heenan b41be54
              
                to_csv compression may now be dict with possible keys 'method' and 'a…
              
              
                drew-heenan 60ea58c
              
                test_to_csv_compression_dict uses compression_only fixture
              
              
                drew-heenan 8ba9082
              
                delegate dict handling to _get_compression_method, type annotations
              
              
                drew-heenan 0a3a9fd
              
                fix import order, None type annotations
              
              
                drew-heenan a1cb3f7
              
                compression args passed as kwargs, update relevant docs
              
              
                drew-heenan af2a96c
              
                style/doc improvements, change arcname to archive_name
              
              
                drew-heenan 5853a28
              
                Merge branch 'master' into issue-26023
              
              
                drew-heenan 789751f
              
                Merge branch 'master' into issue-26023
              
              
                drew-heenan 5b09e6f
              
                add to_csv example, no method test, Optional types, tweaks; update wh…
              
              
                drew-heenan 68a2b4d
              
                remove Index import type ignore
              
              
                drew-heenan c856f50
              
                Revert "remove Index import type ignore"
              
              
                drew-heenan 8df6c81
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                drew-heenan 40d0252
              
                Merge branch 'master' into issue-26023
              
              
                drew-heenan 18a735d
              
                Improve docs/examples
              
              
                drew-heenan 103c877
              
                Merge branch 'master' into issue-26023
              
              
                drew-heenan b6c34bc
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 969d387
              
                Added back missed Callable import in generic
              
              
                WillAyd abfbc0f
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 04ae25d
              
                Address comments
              
              
                WillAyd 9c22652
              
                Typing cleanup
              
              
                WillAyd 56a75c2
              
                Cleaned up docstring
              
              
                WillAyd bbfea34
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 7717f16
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 779511e
              
                blackify
              
              
                WillAyd 780eb04
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 6c4e679
              
                Added annotations where feasible
              
              
                WillAyd 1b567c9
              
                Black and lint
              
              
                WillAyd 9324b63
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 7cf65ee
              
                isort fixup
              
              
                WillAyd 29374f3
              
                Docstring fixup and more annotations
              
              
                WillAyd 6701aa4
              
                Merge remote-tracking branch 'upstream/master' into issue-26023
              
              
                WillAyd 0f5489d
              
                lint fixup
              
              
                WillAyd e04138e
              
                mypy fixup
              
              
                WillAyd 6f2bf00
              
                whatsnew fixup
              
              
                WillAyd 865aa81
              
                Annotation and doc fixups
              
              
                WillAyd 8d1deee
              
                mypy typeshed bug fix
              
              
                WillAyd File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -9,7 +9,19 @@ | |
| import mmap | ||
| import os | ||
| import pathlib | ||
| from typing import IO, AnyStr, BinaryIO, Optional, TextIO, Type | ||
| from typing import ( | ||
| IO, | ||
| Any, | ||
| AnyStr, | ||
| BinaryIO, | ||
| Dict, | ||
| List, | ||
| Optional, | ||
| TextIO, | ||
| Tuple, | ||
| Type, | ||
| Union, | ||
| ) | ||
| from urllib.error import URLError # noqa | ||
| from urllib.parse import ( # noqa | ||
| urlencode, | ||
|  | @@ -255,6 +267,40 @@ def file_path_to_url(path: str) -> str: | |
| _compression_to_extension = {"gzip": ".gz", "bz2": ".bz2", "zip": ".zip", "xz": ".xz"} | ||
|  | ||
|  | ||
| def _get_compression_method( | ||
| compression: Optional[Union[str, Dict[str, str]]] | ||
| ) -> Tuple[Optional[str], Dict[str, str]]: | ||
| """ | ||
| Simplifies a compression argument to a compression method string and | ||
| a dict containing additional arguments. | ||
|  | ||
| Parameters | ||
| ---------- | ||
| compression : str or dict | ||
| If string, specifies the compression method. If dict, value at key | ||
| 'method' specifies compression method. | ||
|  | ||
| Returns | ||
| ------- | ||
| tuple of ({compression method}, Optional[str] | ||
| {compression arguments}, Dict[str, str]) | ||
|  | ||
| Raises | ||
| ------ | ||
| ValueError on dict missing 'method' key | ||
| """ | ||
| # Handle dict | ||
| if isinstance(compression, dict): | ||
| compression_args = compression.copy() | ||
| try: | ||
| compression = compression_args.pop("method") | ||
| except KeyError: | ||
| raise ValueError("If dict, compression must have key 'method'") | ||
| else: | ||
| compression_args = {} | ||
| return compression, compression_args | ||
|  | ||
|  | ||
| def _infer_compression( | ||
| filepath_or_buffer: FilePathOrBuffer, compression: Optional[str] | ||
| ) -> Optional[str]: | ||
|  | @@ -266,21 +312,20 @@ def _infer_compression( | |
|  | ||
|         
                  gfyoung marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| Parameters | ||
| ---------- | ||
| filepath_or_buffer : | ||
| a path (str) or buffer | ||
| filepath_or_buffer : str or file handle | ||
| File path or object. | ||
| compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None} | ||
| If 'infer' and `filepath_or_buffer` is path-like, then detect | ||
| compression from the following extensions: '.gz', '.bz2', '.zip', | ||
| or '.xz' (otherwise no compression). | ||
|  | ||
| Returns | ||
| ------- | ||
| string or None : | ||
| compression method | ||
| string or None | ||
|  | ||
| Raises | ||
| ------ | ||
| ValueError on invalid compression specified | ||
| ValueError on invalid compression specified. | ||
| """ | ||
|  | ||
| # No compression has been explicitly specified | ||
|  | @@ -312,32 +357,49 @@ def _infer_compression( | |
|  | ||
|  | ||
| def _get_handle( | ||
| path_or_buf, mode, encoding=None, compression=None, memory_map=False, is_text=True | ||
| path_or_buf, | ||
| mode: str, | ||
| encoding=None, | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Couldn't annotate this particular argument due to a minor bug in typeshed. Fixed on master so maybe something we can come back to soon (typeshed updates are pretty quick) | ||
| compression: Optional[Union[str, Dict[str, Any]]] = None, | ||
| memory_map: bool = False, | ||
| is_text: bool = True, | ||
| ): | ||
| """ | ||
| Get file handle for given path/buffer and mode. | ||
|  | ||
| Parameters | ||
| ---------- | ||
| path_or_buf : | ||
| a path (str) or buffer | ||
| path_or_buf : str or file handle | ||
| File path or object. | ||
| mode : str | ||
| mode to open path_or_buf with | ||
| Mode to open path_or_buf with. | ||
| encoding : str or None | ||
| compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default None | ||
| If 'infer' and `filepath_or_buffer` is path-like, then detect | ||
| compression from the following extensions: '.gz', '.bz2', '.zip', | ||
| or '.xz' (otherwise no compression). | ||
| Encoding to use. | ||
| compression : str or dict, default None | ||
| If string, specifies compression mode. If dict, value at key 'method' | ||
| specifies compression mode. Compression mode must be one of {'infer', | ||
| 'gzip', 'bz2', 'zip', 'xz', None}. If compression mode is 'infer' | ||
| and `filepath_or_buffer` is path-like, then detect compression from | ||
| the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise | ||
| no compression). If dict and compression mode is 'zip' or inferred as | ||
| 'zip', other entries passed as additional compression options. | ||
|  | ||
| .. versionchanged:: 1.0.0 | ||
|  | ||
| May now be a dict with key 'method' as compression mode | ||
| and other keys as compression options if compression | ||
| mode is 'zip'. | ||
|  | ||
| memory_map : boolean, default False | ||
| See parsers._parser_params for more information. | ||
| is_text : boolean, default True | ||
| whether file/buffer is in text format (csv, json, etc.), or in binary | ||
| mode (pickle, etc.) | ||
| mode (pickle, etc.). | ||
|  | ||
| Returns | ||
| ------- | ||
| f : file-like | ||
| A file-like object | ||
| A file-like object. | ||
| handles : list of file-like objects | ||
| A list of file-like object that were opened in this function. | ||
| """ | ||
|  | @@ -346,15 +408,16 @@ def _get_handle( | |
|  | ||
| need_text_wrapping = (BufferedIOBase, S3File) | ||
| except ImportError: | ||
| need_text_wrapping = BufferedIOBase | ||
| need_text_wrapping = BufferedIOBase # type: ignore | ||
|  | ||
| handles = list() | ||
| handles = list() # type: List[IO] | ||
| f = path_or_buf | ||
|  | ||
| # Convert pathlib.Path/py.path.local or string | ||
| path_or_buf = _stringify_path(path_or_buf) | ||
| is_path = isinstance(path_or_buf, str) | ||
|  | ||
| compression, compression_args = _get_compression_method(compression) | ||
|         
                  WillAyd marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| if is_path: | ||
| compression = _infer_compression(path_or_buf, compression) | ||
|  | ||
|  | @@ -376,7 +439,7 @@ def _get_handle( | |
|  | ||
| # ZIP Compression | ||
| elif compression == "zip": | ||
| zf = BytesZipFile(path_or_buf, mode) | ||
| zf = BytesZipFile(path_or_buf, mode, **compression_args) | ||
| # Ensure the container is closed as well. | ||
| handles.append(zf) | ||
| if zf.mode == "w": | ||
|  | @@ -429,9 +492,9 @@ def _get_handle( | |
|  | ||
| if memory_map and hasattr(f, "fileno"): | ||
| try: | ||
| g = MMapWrapper(f) | ||
| wrapped = MMapWrapper(f) | ||
| f.close() | ||
| f = g | ||
| f = wrapped | ||
| except Exception: | ||
| # we catch any errors that may have occurred | ||
| # because that is consistent with the lower-level | ||
|  | @@ -456,15 +519,19 @@ def __init__( | |
| self, | ||
| file: FilePathOrBuffer, | ||
| mode: str, | ||
| compression: int = zipfile.ZIP_DEFLATED, | ||
| archive_name: Optional[str] = None, | ||
| **kwargs | ||
| ): | ||
| if mode in ["wb", "rb"]: | ||
| mode = mode.replace("b", "") | ||
| super().__init__(file, mode, compression, **kwargs) | ||
| self.archive_name = archive_name | ||
| super().__init__(file, mode, zipfile.ZIP_DEFLATED, **kwargs) | ||
|  | ||
| def write(self, data): | ||
| super().writestr(self.filename, data) | ||
| archive_name = self.filename | ||
| if self.archive_name is not None: | ||
| archive_name = self.archive_name | ||
| super().writestr(archive_name, data) | ||
|  | ||
| @property | ||
| def closed(self): | ||
|  | ||
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.