Skip to content

Conversation

@dmvieira
Copy link

@dmvieira dmvieira commented Jul 29, 2017

What changes were proposed in this pull request?

It's a backport PR for version 2.1 of spark. This change redacts senstive information (based on default password and secret regex)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

Using reference from Mark Grover [email protected] PR: #17047

Closes #17047 for 2.1 spark vesion.

How was this patch tested?

Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))

Ran tests and everything is passing.

…sole

This change redacts senstive information (based on default password and secret regex)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```
There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.

Running unit tests to make sure nothing else is broken by this change.

Using reference from Mark Grover <[email protected]>

Closes apache#17047 for 2.1.2 spark vesion.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jiangxb1987
Copy link
Contributor

Should we backport this to 2.1 since it's a major bugfix(as described in the JIRA)? @vanzin @srowen

@dmvieira
Copy link
Author

dmvieira commented Jul 31, 2017

I'm sorry... I was just suggesting it because is a major issue as described here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19720

I'm using airflow for job submission and password appears in log if I want verbose mode in spark submit

@gatorsmile
Copy link
Member

This sounds reasonable to backport to 2.1.

First, please update your PR title with [BACKPORT-2.1]
Second, please clean your PR description and also explain it is a backport PR at the beginning of the PR description.

@dmvieira dmvieira changed the title [SPARK-19720][CORE] Redact sensitive information from SparkSubmit con… [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information from SparkSubmit console Jul 31, 2017
@dmvieira
Copy link
Author

Please @gatorsmile , check if it is better

}

private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a configurable SQLConf.

Copy link
Author

@dmvieira dmvieira Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's really happening here is that we are backporting some changes introduced in SPARK-18535 while backporting this JIRA (SPARK-19720). SPARK-18535 is a dependency of this, so if we want to backport this, we should really be backporting SPARK-18535 as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @markgrover ! My intention here was only fix this security breach making spark-submit redact patten similar to UI redact pattern. I can change it, but it will be a new feature backport and not a bugfix backport

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another pull request with all feature: #18802

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did PR but I don't know why Jenkins fail with access error... It sounds like permission issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it work there... I tested here and UI and spark-submit already working. I think you can close this pull request and focus on #18802

@gatorsmile
Copy link
Member

You need to close it by yourself. Thanks!

@dmvieira
Copy link
Author

dmvieira commented Aug 5, 2017

Closing this PR since #18802 is completed

@dmvieira dmvieira closed this Aug 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants