Skip to content

Conversation

@dmvieira
Copy link

@dmvieira dmvieira commented Aug 1, 2017

What changes were proposed in this pull request?

Backporting SPARK-18535 and SPARK-19720 to spark 2.1

It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs.

Using reference from Mark Grover [email protected] PRs

How was this patch tested?

Same tests from PR applied

…and UI

## What changes were proposed in this pull request?

This patch adds a new property called `spark.secret.redactionPattern` that
allows users to specify a scala regex to decide which Spark configuration
properties and environment variables in driver and executor environments
contain sensitive information. When this regex matches the property or
environment variable name, its value is redacted from the environment UI and
various logs like YARN and event logs.

This change uses this property to redact information from event logs and YARN
logs. It also, updates the UI code to adhere to this property instead of
hardcoding the logic to decipher which properties are sensitive.

Here's an image of the UI post-redaction:
![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png)

Here's the text in the YARN logs, post-redaction:
``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)``

Here's the text in the event logs, post-redaction:
``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...``

## How was this patch tested?
1. Unit tests are added to ensure that redaction works.
2. A YARN job reading data off of S3 with confidential information
(hadoop credential provider password) being provided in the environment
variables of driver and executor. And, afterwards, logs were grepped to make
sure that no mention of secret password was present. It was also ensure that
the job was able to read the data off of S3 correctly, thereby ensuring that
the sensitive information was being trickled down to the right places to read
the data.
3. The event logs were checked to make sure no mention of secret password was
present.
4. UI environment tab was checked to make sure there was no secret information
being displayed.

Author: Mark Grover <[email protected]>

Closes apache#15971 from markgrover/master_redaction.
…sole

## What changes were proposed in this pull request?
This change redacts senstive information (based on `spark.redaction.regex` property)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

## How was this patch tested?
Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```
There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.

Running unit tests to make sure nothing else is broken by this change.

Author: Mark Grover <[email protected]>

Closes apache#17047 from markgrover/master_redaction.
@vanzin
Copy link
Contributor

vanzin commented Aug 1, 2017

ok to test

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80137 has started for PR 18802 at commit 7b419b4.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80171 has finished for PR 18802 at commit 0096ad9.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80173 has started for PR 18802 at commit 0ca71a8.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80174 has finished for PR 18802 at commit 4424b05.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80175 has finished for PR 18802 at commit e92e24a.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80176 has finished for PR 18802 at commit 49941f7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80177 has started for PR 18802 at commit ad23355.

@SparkQA
Copy link

SparkQA commented Aug 3, 2017

Test build #80207 has finished for PR 18802 at commit 81dc26b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @markgrover @vanzin Could you please take a look at this?

dev/run-tests.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added these changes to fix the tests, but they're unrelated to the patches you're backporting.

They should, at the very least, be a separate PR, if they're really needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it, but tests will fail at Jenkins

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, as I suggested, open a separate PR with the fix for the tests.

@dmvieira
Copy link
Author

dmvieira commented Aug 7, 2017

I removed test fixes and add another PR: #18873 from this branch

@SparkQA
Copy link

SparkQA commented Aug 7, 2017

Test build #80357 has finished for PR 18802 at commit 7b419b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 7, 2017

Merging to 2.1.

asfgit pushed a commit that referenced this pull request Aug 7, 2017
…mation

## What changes were proposed in this pull request?

Backporting SPARK-18535 and SPARK-19720 to spark 2.1

It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs.

Using reference from Mark Grover markapache.org PRs

## How was this patch tested?

Same tests from PR applied

Author: Mark Grover <[email protected]>

Closes #18802 from dmvieira/feature-redact.
@vanzin
Copy link
Contributor

vanzin commented Aug 7, 2017

@dmvieira please close the PR since github doesn't do it automatically.

@dmvieira
Copy link
Author

dmvieira commented Aug 8, 2017

Thank you @vanzin

@dmvieira dmvieira closed this Aug 8, 2017
@markgrover
Copy link
Member

Thanks @dmvieira and @vanzin

@dmvieira dmvieira deleted the feature-redact branch August 16, 2017 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants