Skip to content

Conversation

@akpatnam25
Copy link

What changes were proposed in this pull request?

Add the ability to retry SASL requests. Will add it as a metric too soon to track SASL retries.

Why are the changes needed?

We are seeing increased SASL timeouts internally, and this issue would mitigate the issue. We already have this feature enabled for our 2.3 jobs, and we have seen failures significantly decrease.

Does this PR introduce any user-facing change? No

How was this patch tested?

Added unit tests, and tested on cluster to ensure the retries are being triggered correctly.

Closes #38959 from akpatnam25/SPARK-41415.

Authored-by: Aravind Patnam [email protected]
Signed-off-by: Mridul Muralidharan <mridulgmail.com>

### What changes were proposed in this pull request?

Add the ability to retry SASL requests. Will add it as a metric too soon to track SASL retries.

### Why are the changes needed?
We are seeing increased SASL timeouts internally, and this issue would mitigate the issue. We already have this feature enabled for our 2.3 jobs, and we have seen failures significantly decrease.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit tests, and tested on cluster to ensure the retries are being triggered correctly.

Closes apache#38959 from akpatnam25/SPARK-41415.

Authored-by: Aravind Patnam <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
@github-actions github-actions bot added the CORE label Jan 18, 2023
@akpatnam25
Copy link
Author

@dongjoon-hyun @mridulm

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this, @akpatnam25 .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41415] SASL Request Retries Backport to 3.3 [SPARK-41415][3.3] SASL Request Retries Jan 18, 2023
@akpatnam25
Copy link
Author

will backport SPARK-42090 once this merged

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun
Copy link
Member

Gentle ping once more, @mridulm ~

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Sorry for the delay !

@mridulm
Copy link
Contributor

mridulm commented Jan 21, 2023

Thanks for the ping @dongjoon-hyun :)
I will merge this to 3.3

@dongjoon-hyun
Copy link
Member

Thank you!

mridulm pushed a commit that referenced this pull request Jan 21, 2023
### What changes were proposed in this pull request?

Add the ability to retry SASL requests. Will add it as a metric too soon to track SASL retries.

### Why are the changes needed?
We are seeing increased SASL timeouts internally, and this issue would mitigate the issue. We already have this feature enabled for our 2.3 jobs, and we have seen failures significantly decrease.

### Does this PR introduce _any_ user-facing change? No

### How was this patch tested?
Added unit tests, and tested on cluster to ensure the retries are being triggered correctly.

Closes #38959 from akpatnam25/SPARK-41415.

Authored-by: Aravind Patnam <apatnamlinkedin.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

Closes #39644 from akpatnam25/SPARK-41415-backport-3.3.

Authored-by: Aravind Patnam <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
@mridulm
Copy link
Contributor

mridulm commented Jan 21, 2023

Merged to branch-3.3
This does not get automatically closed @dongjoon-hyun ?

@dongjoon-hyun
Copy link
Member

Yes, only master branch is closed automatically.

@mridulm mridulm closed this Jan 21, 2023
@mridulm
Copy link
Contributor

mridulm commented Jan 21, 2023

Closing, as PR has been merged.
Thanks for backporting this @akpatnam25 !
And thanks for all the help @dongjoon-hyun :-) (you are literally responding in seconds !)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants