-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information #18802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…and UI ## What changes were proposed in this pull request? This patch adds a new property called `spark.secret.redactionPattern` that allows users to specify a scala regex to decide which Spark configuration properties and environment variables in driver and executor environments contain sensitive information. When this regex matches the property or environment variable name, its value is redacted from the environment UI and various logs like YARN and event logs. This change uses this property to redact information from event logs and YARN logs. It also, updates the UI code to adhere to this property instead of hardcoding the logic to decipher which properties are sensitive. Here's an image of the UI post-redaction:  Here's the text in the YARN logs, post-redaction: ``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)`` Here's the text in the event logs, post-redaction: ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...`` ## How was this patch tested? 1. Unit tests are added to ensure that redaction works. 2. A YARN job reading data off of S3 with confidential information (hadoop credential provider password) being provided in the environment variables of driver and executor. And, afterwards, logs were grepped to make sure that no mention of secret password was present. It was also ensure that the job was able to read the data off of S3 correctly, thereby ensuring that the sensitive information was being trickled down to the right places to read the data. 3. The event logs were checked to make sure no mention of secret password was present. 4. UI environment tab was checked to make sure there was no secret information being displayed. Author: Mark Grover <[email protected]> Closes apache#15971 from markgrover/master_redaction.
…sole ## What changes were proposed in this pull request? This change redacts senstive information (based on `spark.redaction.regex` property) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. ## How was this patch tested? Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Author: Mark Grover <[email protected]> Closes apache#17047 from markgrover/master_redaction.
|
ok to test |
|
Test build #80137 has started for PR 18802 at commit |
|
Test build #80171 has finished for PR 18802 at commit
|
0096ad9 to
0ca71a8
Compare
|
Test build #80173 has started for PR 18802 at commit |
0ca71a8 to
4424b05
Compare
|
Test build #80174 has finished for PR 18802 at commit
|
4424b05 to
e92e24a
Compare
|
Test build #80175 has finished for PR 18802 at commit
|
e92e24a to
49941f7
Compare
|
Test build #80176 has finished for PR 18802 at commit
|
49941f7 to
ad23355
Compare
|
Test build #80177 has started for PR 18802 at commit |
ad23355 to
81dc26b
Compare
|
Test build #80207 has finished for PR 18802 at commit
|
|
cc @markgrover @vanzin Could you please take a look at this? |
dev/run-tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you added these changes to fix the tests, but they're unrelated to the patches you're backporting.
They should, at the very least, be a separate PR, if they're really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it, but tests will fail at Jenkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, as I suggested, open a separate PR with the fix for the tests.
81dc26b to
7b419b4
Compare
|
I removed test fixes and add another PR: #18873 from this branch |
|
Test build #80357 has finished for PR 18802 at commit
|
|
Merging to 2.1. |
…mation ## What changes were proposed in this pull request? Backporting SPARK-18535 and SPARK-19720 to spark 2.1 It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs. Using reference from Mark Grover markapache.org PRs ## How was this patch tested? Same tests from PR applied Author: Mark Grover <[email protected]> Closes #18802 from dmvieira/feature-redact.
|
@dmvieira please close the PR since github doesn't do it automatically. |
|
Thank you @vanzin |
What changes were proposed in this pull request?
Backporting SPARK-18535 and SPARK-19720 to spark 2.1
It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs.
Using reference from Mark Grover [email protected] PRs
How was this patch tested?
Same tests from PR applied