- 
                Notifications
    
You must be signed in to change notification settings  - Fork 9.1k
 
HADOOP-18304. Improve user-facing S3A committers documentation #4478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18304. Improve user-facing S3A committers documentation #4478
Conversation
| 
           Note that with the current patch, the table of contents has changed. FROM (trunk): # Committing work to S3 with the "S3A Committers"
### January 2021 Update
## Introduction: The Commit Problem
### Background : Hadoop's "Commit Protocol"
## Meet the S3A Committers
### The Staging Committer
## Conflict Resolution in the Staging Committers
### The Magic Committer
#### Which Committer to Use?
## Switching to an S3A Committer
## Using the Directory and Partitioned Staging Committers
## The "Partitioned" Staging Committer
### Notes
## Using the Magic committer
### FileSystem client setup
### Enabling the committer
## Common Committer Options
## Staging committer (Directory and Partitioned) options
### Common Committer Options
### Staging Committer Options
### Disabling magic committer path rewriting
## <a name="concurrent-jobs"></a> Concurrent Jobs writing to the same destination
## TroubleshootingTO (5e8cdf0): # Committing work to S3 with the "S3A Committers"
### January 2021 Update
## Introduction: The Commit Problem
### Background: Hadoop's "Commit Protocol"
## Meet the S3A Committers
### The Staging Committers
#### Conflict Resolution in the Staging Committers
### The Magic Committer
### Which Committer to Use?
## Switching to an S3A Committer
## Using the Staging Committers
### The "Partitioned" Staging Committer
### Notes on using Staging Committers
## Using the Magic committer
### FileSystem client setup
### Enabling the committer
## Committer Options Reference
### Common S3A Committer Options
### Staging committer (Directory and Partitioned) options
### Disabling magic committer path rewriting
## <a name="concurrent-jobs"></a> Concurrent Jobs writing to the same destination
## Troubleshooting | 
    
| 
           💔 -1 overall 
 
 This message was automatically generated.  | 
    
5e8cdf0    to
    ffdc6bd      
    Compare
  
    | 
           🎊 +1 overall 
 
 This message was automatically generated.  | 
    
| 
           @ahmarsuhail - would you be able to review this week?  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just some minor formatting changes
        
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
              
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
              
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
              
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
      | files which do not contain relevant data. | ||
| 
               | 
          ||
| What the partitioned committer does is, where the tooling permits, allows callers | ||
| to add data to an existing partitioned layout*. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rephrasing. "If tool permits, the partitioned committer allows callers to add data to an existing partitioned layout." also remove * before the full stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this one, i'm not sure if the * is referring to something. I will remove for now. Happy to add it back
| it would be overridden. Set `fs.s3a.committer.staging.unique-filenames` to `true` | ||
| If the file `log-20170228.avro` in the example above already existed, it would be overwritten. | ||
| 
               | 
          ||
| Set `fs.s3a.committer.staging.unique-filenames` to `true` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation isn't right here. I think it makes sense to have this as part of point 3, consider reverting this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the indentation like this I believe allows you to put the sentence on a new line but still part of the previous point.
That being said, it is not obvious from the markdown and I cannot test the output HTML so I'll revert.
        
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
              
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
              
          
                hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
          
            Show resolved
            Hide resolved
        
      | 
           @ahmarsuhail, I've updated based on your feedback. Thanks for reviewing it with such detail. I've put the changes for things like missing   | 
    
| 
           💔 -1 overall 
 
 This message was automatically generated.  | 
    
| 
           💔 -1 overall 
 
 This message was automatically generated.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, looks good!
| 
           🎊 +1 overall 
 
 This message was automatically generated.  | 
    
| 
           Would you or a colleague be able to take a look at this PR in the next week, @mehakmeet?  | 
    
| 
           Yes @dannycjones, I'll check this by tomorrow morning IST.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM bar few nits.
| ``` | ||
| 
               | 
          ||
| 
               | 
          ||
| 
               | 
          
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: blank lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will drop those!
| 
               | 
          ||
| Now that Amazon S3 is consistent, the magic committer is enabled by default. | ||
| Now that [Amazon S3 is consistent](https://aws.amazon.com/s3/consistency/), | ||
| the magic committer is enabled by default. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File committer is still the default committer, right? So what does magic committer being "enabled by default" here mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that's correct.
I believe what we're saying here is that S3A's "magic path rewriting" where it only stages the writes to __magic/ directories is now enabled by default.
I will update this to be clearer, something like:
| the magic committer is enabled by default. | |
| the magic directory path rewriting is enabled by default. | 
| This then is the problem which the S3A committers address: | ||
| 
               | 
          ||
| *How to safely and reliably commit work to Amazon S3 or compatible object store* | ||
| >*How to safely and reliably commit work to Amazon S3 or compatible object store.* | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we quoting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, it is not quoting anyone (or maybe Steve). I will let it join the previous sentence as I think that makes more sense.
| 
           🎊 +1 overall 
 
 This message was automatically generated.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| 
           Thanks @dannycjones, Can you please open a backport to branch-3.3, I'll hit merge after green yetus there.  | 
    
…e#4478) Contributed by: Daniel Carl Jones
| 
           Thanks Mehakmeet! I've opened #5043 for backport to   | 
    
Contributed by: Daniel Carl Jones
Contributed by: Daniel Carl Jones
…e#4478) Contributed by: Daniel Carl Jones
Description of PR
As noted in the ticket, this PR attempts to improve the committer docs given a fresh pair of eyes from someone who has not worked with the committers before.
I've tried to ensure that the Table of Contents makes more sense too.
How was this patch tested?
Reading :)
I did try and build the HTML but I couldn't get it to pickup the new markdown.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?