- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.9k
 
Suggest character encoding is incorrect when encountering random null bytes #81856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    | 
          
 (rust-highfive has picked a reviewer for you, use r? to override)  | 
    
              
                    pickfire
  
              
              reviewed
              
                  
                    Feb 7, 2021 
                  
              
              
            
            
              
                    nagisa
  
              
              reviewed
              
                  
                    Feb 7, 2021 
                  
              
              
            
            
| 
           @bors r+  | 
    
| 
           📌 Commit ed8c686 has been approved by   | 
    
    
  Dylan-DPC-zz 
      pushed a commit
        to Dylan-DPC-zz/rust
      that referenced
      this pull request
    
      Feb 27, 2021 
    
    
      
  
    
      
    
  
Suggest character encoding is incorrect when encountering random null bytes This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in rust-lang#73979 (comment). Closes rust-lang#73979.
    
  bors 
      added a commit
        to rust-lang-ci/rust
      that referenced
      this pull request
    
      Feb 28, 2021 
    
    
      
  
    
      
    
  
Rollup of 11 pull requests Successful merges: - rust-lang#81856 (Suggest character encoding is incorrect when encountering random null bytes) - rust-lang#82395 (Add missing "see its documentation for more" stdio) - rust-lang#82401 (Remove a redundant macro) - rust-lang#82498 (Use log level to control partitioning debug output) - rust-lang#82534 (Link crtbegin/crtend on musl to terminate .eh_frame) - rust-lang#82537 (Update measureme dependency to the latest version) - rust-lang#82561 (doc: cube root, not cubic root) - rust-lang#82563 (Fix intra-doc handling of `Self` in enum) - rust-lang#82584 (Add ARIA role to sidebar toggle in Rustdoc) - rust-lang#82596 (clarify RW lock's priority gotcha) - rust-lang#82607 (Add a getter for Frame.loc) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      Labels
      
    S-waiting-on-bors
  Status: Waiting on bors to run and complete tests. Bors will change the label on completion. 
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a BOM (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in #73979 (comment).
Closes #73979.