-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information from SparkSubmit console #18765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sole This change redacts senstive information (based on default password and secret regex) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Using reference from Mark Grover <[email protected]> Closes apache#17047 for 2.1.2 spark vesion.
|
Can one of the admins verify this patch? |
|
I'm sorry... I was just suggesting it because is a major issue as described here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19720 I'm using airflow for job submission and password appears in log if I want verbose mode in spark submit |
|
This sounds reasonable to backport to 2.1. First, please update your PR title with [BACKPORT-2.1] |
|
Please @gatorsmile , check if it is better |
| } | ||
|
|
||
| private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)" | ||
| private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a configurable SQLConf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm following UI pattern: https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
Redact with configurable SQLConf is a port of this feature: https://issues.apache.org/jira/browse/SPARK-18535 . Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's really happening here is that we are backporting some changes introduced in SPARK-18535 while backporting this JIRA (SPARK-19720). SPARK-18535 is a dependency of this, so if we want to backport this, we should really be backporting SPARK-18535 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @markgrover ! My intention here was only fix this security breach making spark-submit redact patten similar to UI redact pattern. I can change it, but it will be a new feature backport and not a bugfix backport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another pull request with all feature: #18802
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did PR but I don't know why Jenkins fail with access error... It sounds like permission issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it work there... I tested here and UI and spark-submit already working. I think you can close this pull request and focus on #18802
|
You need to close it by yourself. Thanks! |
|
Closing this PR since #18802 is completed |
What changes were proposed in this pull request?
It's a backport PR for version 2.1 of spark. This change redacts senstive information (based on default password and secret regex)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.
Using reference from Mark Grover [email protected] PR: #17047
Closes #17047 for 2.1 spark vesion.
How was this patch tested?
Testing was done manually to make sure that the console logs were not printing any
sensitive information.
Here's some output from the console:
Ran tests and everything is passing.