Honor default fs name when initializing event logger. #450

vanzin · 2014-04-19T00:09:13Z

This is related to SPARK-1459 / PR #375. Without this fix,
FileLogger.createLogDir() may try to create the log dir on
HDFS, while createWriter() will try to open the log file on
the local file system, leading to interesting errors and
confusion.

AmplabJenkins · 2014-04-19T00:13:12Z

Can one of the admins verify this patch?

tgravescs · 2014-04-21T16:16:09Z

core/src/main/scala/org/apache/spark/util/FileLogger.scala

FileSystem has a function to give you the default filesystem without you have to look at the configs directly:

public static URI getDefaultUri(Configuration conf)

pwendell · 2014-04-22T06:09:11Z

Jenkins, test this please

AmplabJenkins · 2014-04-22T06:12:57Z

Merged build triggered.

AmplabJenkins · 2014-04-22T06:13:02Z

Merged build started.

AmplabJenkins · 2014-04-22T06:57:46Z

Merged build finished.

AmplabJenkins · 2014-04-22T06:57:46Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14319/

pwendell · 2014-04-22T17:04:53Z

The ordering of initialization in the DAGScheduler is very sensitive. Could this patch instead just move the Hadoop configuration up?

andrewor14 · 2014-04-22T21:50:34Z

core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala

nit: style should be

private[spark] class EventLoggingListener( appName: String, ... extends ...

This is related to SPARK-1459 / PR apache#375. Without this fix, FileLogger.createLogDir() may try to create the log dir on HDFS, while createWriter() will try to open the log file on the local file system, leading to interesting errors and confusion.

andrewor14 · 2014-04-23T00:33:17Z

Thanks for the changes @vanzin. I think the logic is much more straightforward now. This LGTM.

pwendell · 2014-04-23T17:15:04Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-23T17:17:57Z

Merged build triggered.

AmplabJenkins · 2014-04-23T17:18:05Z

Merged build started.

AmplabJenkins · 2014-04-23T18:48:57Z

Merged build finished.

AmplabJenkins · 2014-04-23T18:48:57Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14378/

pwendell · 2014-04-23T21:01:09Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-23T21:02:56Z

Merged build triggered.

AmplabJenkins · 2014-04-23T21:03:03Z

Merged build started.

AmplabJenkins · 2014-04-23T21:45:42Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-23T21:45:43Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14398/

pwendell · 2014-04-23T21:48:08Z

Thanks - merged this.

This is related to SPARK-1459 / PR #375. Without this fix, FileLogger.createLogDir() may try to create the log dir on HDFS, while createWriter() will try to open the log file on the local file system, leading to interesting errors and confusion. Author: Marcelo Vanzin <[email protected]> Closes #450 from vanzin/event-file-2 and squashes the following commits: 592cdb3 [Marcelo Vanzin] Honor default fs name when initializing event logger. (cherry picked from commit dd1b7a6) Signed-off-by: Patrick Wendell <[email protected]>

apache#450. Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit e603784b3a562980e6f1863845097effe2129d3b Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 11:34:41 2014 -0800 Re-add check for empty set of failed stages commit d258f0ef50caff4bbb19fb95a6b82186db1935bf Author: Kay Ousterhout <[email protected]> Date: Wed Jan 15 23:35:41 2014 -0800 Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs.

This is related to SPARK-1459 / PR apache#375. Without this fix, FileLogger.createLogDir() may try to create the log dir on HDFS, while createWriter() will try to open the log file on the local file system, leading to interesting errors and confusion. Author: Marcelo Vanzin <[email protected]> Closes apache#450 from vanzin/event-file-2 and squashes the following commits: 592cdb3 [Marcelo Vanzin] Honor default fs name when initializing event logger.

apache#450. Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit e603784b3a562980e6f1863845097effe2129d3b Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 11:34:41 2014 -0800 Re-add check for empty set of failed stages commit d258f0ef50caff4bbb19fb95a6b82186db1935bf Author: Kay Ousterhout <[email protected]> Date: Wed Jan 15 23:35:41 2014 -0800 Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs. (cherry picked from commit 0b448df) Signed-off-by: Patrick Wendell <[email protected]>

…pattern ### What changes were proposed in this pull request? This PR aims to redact `awsAccessKeyId` by including `accesskey` pattern. - **Apache Spark 4.0.0-preview1** There is no point to redact `fs.s3a.access.key` because the same value is exposed via `fs.s3.awsAccessKeyId` like the following. We need to redact all. ``` $ AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B bin/spark-shell ``` ![Screenshot 2024-07-17 at 12 45 44](https://github.com/user-attachments/assets/e3040c5d-3eb9-4944-a6d6-5179b7647426) ### Why are the changes needed? Since Apache Spark 1.1.0, `AWS_ACCESS_KEY_ID` is propagated like the following. However, Apache Spark does not redact them all consistently. - #450 https://github.com/apache/spark/blob/5d16c3134c442a5546251fd7c42b1da9fdf3969e/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L481-L486 ### Does this PR introduce _any_ user-facing change? Users may see more redactions on configurations whose name contains `accesskey` case-insensitively. However, those configurations are highly likely to be related to the credentials. ### How was this patch tested? Pass the CIs with the newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47392 from dongjoon-hyun/SPARK-48930. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…pattern ### What changes were proposed in this pull request? This PR aims to redact `awsAccessKeyId` by including `accesskey` pattern. - **Apache Spark 4.0.0-preview1** There is no point to redact `fs.s3a.access.key` because the same value is exposed via `fs.s3.awsAccessKeyId` like the following. We need to redact all. ``` $ AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B bin/spark-shell ``` ![Screenshot 2024-07-17 at 12 45 44](https://github.com/user-attachments/assets/e3040c5d-3eb9-4944-a6d6-5179b7647426) ### Why are the changes needed? Since Apache Spark 1.1.0, `AWS_ACCESS_KEY_ID` is propagated like the following. However, Apache Spark does not redact them all consistently. - #450 https://github.com/apache/spark/blob/5d16c3134c442a5546251fd7c42b1da9fdf3969e/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L481-L486 ### Does this PR introduce _any_ user-facing change? Users may see more redactions on configurations whose name contains `accesskey` case-insensitively. However, those configurations are highly likely to be related to the credentials. ### How was this patch tested? Pass the CIs with the newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47392 from dongjoon-hyun/SPARK-48930. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1e17c39) Signed-off-by: Dongjoon Hyun <[email protected]>

…pattern ### What changes were proposed in this pull request? This PR aims to redact `awsAccessKeyId` by including `accesskey` pattern. - **Apache Spark 4.0.0-preview1** There is no point to redact `fs.s3a.access.key` because the same value is exposed via `fs.s3.awsAccessKeyId` like the following. We need to redact all. ``` $ AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B bin/spark-shell ``` ![Screenshot 2024-07-17 at 12 45 44](https://github.com/user-attachments/assets/e3040c5d-3eb9-4944-a6d6-5179b7647426) ### Why are the changes needed? Since Apache Spark 1.1.0, `AWS_ACCESS_KEY_ID` is propagated like the following. However, Apache Spark does not redact them all consistently. - apache#450 https://github.com/apache/spark/blob/5d16c3134c442a5546251fd7c42b1da9fdf3969e/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L481-L486 ### Does this PR introduce _any_ user-facing change? Users may see more redactions on configurations whose name contains `accesskey` case-insensitively. However, those configurations are highly likely to be related to the credentials. ### How was this patch tested? Pass the CIs with the newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47392 from dongjoon-hyun/SPARK-48930. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…the database location as table location placeholder (apache#450)

tgravescs reviewed Apr 21, 2014
View reviewed changes

andrewor14 reviewed Apr 22, 2014
View reviewed changes

asfgit closed this in dd1b7a6 Apr 23, 2014

vanzin deleted the event-file-2 branch May 23, 2014 22:39

dongjoon-hyun mentioned this pull request Jul 17, 2024

[SPARK-48930][CORE] Redact awsAccessKeyId by including accesskey pattern #47392

Closed

turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025

[HADP-53021] Support to use a temporary path instead of a path under …

0597e50

…the database location as table location placeholder (apache#450)

Honor default fs name when initializing event logger. #450

Honor default fs name when initializing event logger. #450

Uh oh!

Conversation

vanzin commented Apr 19, 2014

Uh oh!

AmplabJenkins commented Apr 19, 2014

Uh oh!

tgravescs Apr 21, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

AmplabJenkins commented Apr 22, 2014

Uh oh!

pwendell commented Apr 22, 2014

Uh oh!

andrewor14 Apr 22, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

AmplabJenkins commented Apr 23, 2014

Uh oh!

pwendell commented Apr 23, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants