Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Aug 1, 2025

What changes were proposed in this pull request?

This PR implements and calls new APIs in FileCommitProtocol instead of the deprecated

Why are the changes needed?

FileCommitProtocol and related classes are complicated as they play a lot of tricks for tasks like file naming, config setting/propagation, e.t.c. Removing these references can improve the call stack a bit. And also, we can make these deprecated ones ignorable。

Does this PR introduce any user-facing change?

No, nothing changes for existing implementations or end-users

How was this patch tested?

Pass existing CIs

Was this patch authored or co-authored using generative AI tooling?

no


override def newTaskTempFileAbsPath(
taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String = {
taskContext: TaskAttemptContext, absoluteDir: String, spec: FileNameSpec): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. A wrong indentation.

taskAttemptContext,
None,
f"-c$fileCounter%03d" + ext)
FileNameSpec("", f"-c$fileCounter%03d" + ext))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a new independent change, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the newer newTaskTempFile with spec is the goal of this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title is not~

Remove inner references of deprecated APIs in FileCommitProtocol

If this is this PR goal, please remove the throwing Exceptions in the deprecated APIs.

Calling the newer newTaskTempFile with spec is the goal of this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun, I got your point. I've changed the title to Implement and call new APIs in FileCommitProtocol instead of deprecated. Does the change here look reasonable to you with this positive tone? If still not, I will separate the overrides and callers into 2 PRs

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I trust your decision. Initially, I want to remove the default implementation, throw SparkException, from this PR. But, this could be a migration step too. So, I agree with your decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dongjoon-hyun

@deprecated("use newTaskTempFile(..., spec: FileNameSpec) instead", "3.3.0")
def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String
def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String = {
throw SparkException.mustOverrideOneMethodError("newTaskTempFile")
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I don't think this is required to achieve your goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it is not clear why these changes are being made.
I got confused with the intent as well - though now I realize what is being attempted.

def newTaskTempFileAbsPath(
taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String
taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String = {
throw SparkException.mustOverrideOneMethodError("newTaskTempFileAbsPath")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@yaooqinn yaooqinn changed the title [SPARK-53063][CORE] Remove inner references of deprecated APIs in FileCommitProtocol [SPARK-53063][CORE] Implement and call new APIs in FileCommitProtocol instead of deprecated Aug 4, 2025
@yaooqinn yaooqinn changed the title [SPARK-53063][CORE] Implement and call new APIs in FileCommitProtocol instead of deprecated [SPARK-53063][CORE] Implement and call new APIs in FileCommitProtocol instead of the deprecated Aug 4, 2025
@yaooqinn yaooqinn closed this in c9b85a2 Aug 4, 2025
@yaooqinn yaooqinn deleted the SPARK-53063 branch August 4, 2025 05:56
@yaooqinn
Copy link
Member Author

yaooqinn commented Aug 4, 2025

Merged to master, thank you @dongjoon-hyun again.

new SparkRuntimeException(
"INTERNAL_ERROR",
Map("message" -> msg))
SparkException.mustOverrideOneMethodError(msg)
Copy link
Contributor

@mridulm mridulm Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SparkException.mustOverrideOneMethodError(methodName) ? Or better still remove mustOverrideOneMethodError and delegate to SparkException.mustOverrideOneMethodError insead ?

@mridulm
Copy link
Contributor

mridulm commented Aug 7, 2025

It is unclear what the benefits of this PR are.
Note that these classes get extended by users for customizing behavior, and I am not completely certain if there is parity in functionality after this PR.

@yaooqinn
Copy link
Member Author

yaooqinn commented Aug 7, 2025

Hi @mridulm,

If we implement a new Custom-FileCommitProtocol, we don't need to override the deprecated newTaskTempFile anymore. This makes the future removal of these APIs safer.

@mridulm
Copy link
Contributor

mridulm commented Aug 7, 2025

There are unfortunately nontrivial number of user/library implementations which override from these public classes (a github search shows a lot - even if we exclude forlks, etc there appears to be significant count), and depend on the way they were currently written (for better or for worse).
I have not looked in detail at whether the changes in this PR will not cause regression - but please do evaluate with that lens : if there is nontrivial value, we can definitely enforce deprecation - but otherwise, we should be careful not to break compatibility

@yaooqinn
Copy link
Member Author

yaooqinn commented Aug 7, 2025

As long as the base interface can satisfy the current built-in implementations, there's no difference for third-party implementations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants