-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27748][SS] Kafka consumer/producer password/token redaction. #24627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| | useTicketCache=true | ||
| | serviceName="${clusterConf.kerberosServiceName}"; | ||
| """.stripMargin.replace("\n", "") | ||
| """.stripMargin.replace("\n", "").trim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated fix. There were additional spaces at the end of the generated string. This change removes it. Worth to mention the code works both way but thought it's just ugly.
| | password="$password"; | ||
| """.stripMargin.replace("\n", "") | ||
| logDebug(s"Scram JAAS params: ${params.replaceAll("password=\".*\"", "password=\"[hidden]\"")}") | ||
| """.stripMargin.replace("\n", "").trim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated fix. There were additional spaces at the end of the generated string. This change removes it. Worth to mention the code works both way but thought it's just ugly.
|
Test build #105457 has finished for PR 24627 at commit
|
|
Retest this please. |
|
Test build #105517 has finished for PR 24627 at commit
|
|
Retest this please. |
|
Test build #105518 has finished for PR 24627 at commit
|
| val redactionPattern = SparkEnv.get.conf.get(SECRET_REDACTION_PATTERN) | ||
| params.map { case (key, value) => | ||
| if (key.equalsIgnoreCase(SaslConfigs.SASL_JAAS_CONFIG)) { | ||
| (key, redactJaasParam(value.asInstanceOf[String])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @gaborgsomogyi .
Is this the only reason why we cannot use Utils.redact directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun
Not sure what you mean only reason. The short answer is yes.
A little but more detailed SaslConfigs.SASL_JAAS_CONFIG has different format than any other property. A normal property looks like the following:
Key=ssl.truststore.password, Value=secret.
SaslConfigs.SASL_JAAS_CONFIG however have the following syntax:
Key=sasl.jaas.config, Value=org.apache.kafka.common.security.scram.ScramLoginModule required tokenauth=true serviceName="kafka" username="admin" password="admin-secret";
Utils.redact makes a malformed and unreadable string out of it.
|
I know this patch just apply redaction to all the places which print configuration as of now, but I'm feeling that Kafka module is too verbose on configuration, especially KafkaConfigUpdater. Assuming debug log level is turned on, you would majorly want to track which consumers are being acquired or evicted, or closed. Logging in KafkaConfigUpdater doesn't help either. Let's revisit the purpose of logging. I guess we are logging to find when consumer/producer is created, and when it's evicted/closed. (either tracking each instance, or counting for each key). Though we use whole configuration as a key of cache, we (as human) cannot check every key-value pairs to find same instance. Some pairs play as a differentiator and we (again, as human) just assume other pairs are not different. So IMHO we don't even need to print out all configurations. Especially, if we really want to track each instance, explicit ID (as #19096 introduces) would work better and much helpful to debug. |
Without printing out all the configurations I'm wondering how do you think one can find out why a consumer/producer creation fail?
I agree, introducing an ID would be enough but only when consumer/producer already created. Such case not necessary to log all configurations. This is the next PR in my queue but didn't want to mix up the 2 changes.
Yeah, this can be discussed. |
Maybe I was not clear, more clearly I don't think we should print out configurations all the time. Does Kafka module support configuration change in runtime? If not a data source needs to print out these configuration only once, instead of for every creation/termination of instances. |
Same understanding. So this means redaction is needed.
I don't think so but I'll sit together with Kafka guys, have a deeper understanding and based on that we can go forward. That said my intention is to reduce log size drastically and introducing ID tracking but I suggest to do it in an additional PR to keep the focus. |
|
I meant Kafka module is |
|
From Spark side we're not changing the configuration otherwise caching doesn't make sense. |
|
As this patch fixes the issue the current code has, IMHO we can go with this patch, and apply the new idea of improvements in another issues/PRs. I can help dealing with applying new idea as well. |
|
@dongjoon-hyun do you have any other comments? |
|
not much movement, maybe @vanzin ? |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logs do seem a bit verbose, but since they're already there...
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala
Outdated
Show resolved
Hide resolved
.../kafka-0-10-token-provider/src/main/scala/org/apache/spark/kafka010/KafkaRedactionUtil.scala
Outdated
Show resolved
Hide resolved
|
Test build #105974 has finished for PR 24627 at commit
|
.../kafka-0-10-token-provider/src/main/scala/org/apache/spark/kafka010/KafkaRedactionUtil.scala
Outdated
Show resolved
Hide resolved
|
Test build #106101 has finished for PR 24627 at commit
|
|
merging to master. |
What changes were proposed in this pull request?
Kafka parameters are logged at several places and the following parameters has to be redacted:
ssl.truststore.passwordssl.keystore.passwordssl.key.passwordThis PR contains:
spark.redaction.regex)sasl.jaas.config(delegation token)How was this patch tested?
Existing + additional unit tests.