[SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access #36658

manuzhang · 2022-05-25T01:57:36Z

What changes were proposed in this pull request?

Fix precedence of configs of Hadoop Filesystems to access.

Before this PR

spark.kerberos.access.hadoopFileSystems -> spark.yarn.access.namenodes -> spark.yarn.access.hadoopFileSystems

After this PR

spark.kerberos.access.hadoopFileSystems ->  spark.yarn.access.hadoopFileSystems -> spark.yarn.access.namenodes

Why are the changes needed?

Before #23698, the precedence of configuring Hadoop Filesystems to access is

spark.yarn.access.hadoopFileSystems -> spark.yarn.access.namenodes

Afterwards, it's

spark.kerberos.access.hadoopFileSystems -> spark.yarn.access.namenodes -> spark.yarn.access.hadoopFileSystems

When both spark.yarn.access.hadoopFileSystems and spark.yarn.access.namenodes are configured with different values, the PR will break backward compatibility and cause application failure.

Does this PR introduce any user-facing change?

Yes. Fix backward compatibility.

How was this patch tested?

Updated UT.

… of Hadoop Filesystems to access

manuzhang · 2022-05-25T04:49:51Z

cc @gaborgsomogyi @vanzin

gaborgsomogyi · 2022-05-25T07:14:35Z

Let's take a look at how the configs were evolving:

spark.yarn.access.namenodes introduced
spark.yarn.access.hadoopFileSystems introduced so spark.yarn.access.namenodes deprecated
spark.kerberos.access.hadoopFileSystems introduced so spark.yarn.access.hadoopFileSystems deprecated

So from my perspective the correct order is:

spark.yarn.access.namenodes must be overwritten by spark.yarn.access.hadoopFileSystems
spark.yarn.access.namenodes and spark.yarn.access.hadoopFileSystems must be overwritten by spark.kerberos.access.hadoopFileSystems

I understand that it was different previously but I wouldn't change it because of the following reasons:

If we take a look at the deprecation history then the actual behavior makes sense(newer configs must take precedence). Personally I would consider [SPARK-26766][CORE] Remove the list of filesystems from HadoopDelegationTokenProvider.obtainDelegationTokens #23698 as a fix.
We're fixing things on master branch but there these 2 configs are just deprecated and spark.kerberos.access.hadoopFileSystems is the preferred way so I suggest to migrate to that.

manuzhang · 2022-06-15T05:29:35Z

newer configs must take precedence

This is not true currently. Meanwhile, deprecated configs are not removed yet and as long as the configs are used they should not break users' applications.

manuzhang · 2022-06-15T05:30:24Z

core/src/test/scala/org/apache/spark/SparkConfSuite.scala

    conf.set("spark.yarn.access.namenodes", "testNode")
    assert(conf.get(KERBEROS_FILESYSTEMS_TO_ACCESS) === Array("testNode"))

-    conf.set("spark.yarn.access.hadoopFileSystems", "testNode")


I'd like to point out this test is meaningless before this change.

github-actions · 2022-09-24T00:27:28Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-39278][CORE] Fix backward compatibility of alternative configs…

d21927d

… of Hadoop Filesystems to access

github-actions bot added the CORE label May 25, 2022

manuzhang commented Jun 15, 2022

View reviewed changes

github-actions bot added the Stale label Sep 24, 2022

github-actions bot closed this Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access #36658

[SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access #36658

manuzhang commented May 25, 2022 •

edited

Loading

Uh oh!

manuzhang commented May 25, 2022

Uh oh!

gaborgsomogyi commented May 25, 2022

Uh oh!

manuzhang commented Jun 15, 2022

Uh oh!

manuzhang Jun 15, 2022

Uh oh!

github-actions bot commented Sep 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access #36658

[SPARK-39278][CORE] Fix backward compatibility of alternative configs of Hadoop Filesystems to access #36658

Conversation

manuzhang commented May 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

manuzhang commented May 25, 2022

Uh oh!

gaborgsomogyi commented May 25, 2022

Uh oh!

manuzhang commented Jun 15, 2022

Uh oh!

manuzhang Jun 15, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manuzhang commented May 25, 2022 •

edited

Loading