-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas #39516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas #39516
Conversation
25aab03 to
242ff01
Compare
|
I don't think there's any test for this, and the linter passed. Therefore, merging it now. Merged to master. |
|
I merged it to branch-3.3 and branch-3.2 too. |
See https://issues.apache.org/jira/browse/SPARK-41989 for in depth explanation Short summary: `pyspark/pandas/__init__.py` uses, at import time, `logging.warning()` which might silently call `logging.basicConfig()`. So by importing `pyspark.pandas` (directly or indirectly) a user might unknowingly break their own logging setup (e.g. when based on `logging.basicConfig()` or related). `logging.getLogger(...).warning()` does not trigger this behavior. User-defined logging setups will be more predictable. Manual testing so far. I'm not sure it's worthwhile to cover this with a unit test Closes #39516 from soxofaan/SPARK-41989-pyspark-pandas-logging-setup. Authored-by: Stefaan Lippens <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 04836ba) Signed-off-by: Hyukjin Kwon <[email protected]>
See https://issues.apache.org/jira/browse/SPARK-41989 for in depth explanation Short summary: `pyspark/pandas/__init__.py` uses, at import time, `logging.warning()` which might silently call `logging.basicConfig()`. So by importing `pyspark.pandas` (directly or indirectly) a user might unknowingly break their own logging setup (e.g. when based on `logging.basicConfig()` or related). `logging.getLogger(...).warning()` does not trigger this behavior. User-defined logging setups will be more predictable. Manual testing so far. I'm not sure it's worthwhile to cover this with a unit test Closes #39516 from soxofaan/SPARK-41989-pyspark-pandas-logging-setup. Authored-by: Stefaan Lippens <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 04836ba) Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? See https://issues.apache.org/jira/browse/SPARK-41989 for in depth explanation Short summary: `pyspark/pandas/__init__.py` uses, at import time, `logging.warning()` which might silently call `logging.basicConfig()`. So by importing `pyspark.pandas` (directly or indirectly) a user might unknowingly break their own logging setup (e.g. when based on `logging.basicConfig()` or related). `logging.getLogger(...).warning()` does not trigger this behavior. ### Does this PR introduce _any_ user-facing change? User-defined logging setups will be more predictable. ### How was this patch tested? Manual testing so far. I'm not sure it's worthwhile to cover this with a unit test Closes apache#39516 from soxofaan/SPARK-41989-pyspark-pandas-logging-setup. Authored-by: Stefaan Lippens <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @soxofaan and @HyukjinKwon .
This seems to cause a production down because some applications already rely on this import logging. After this patch, PySpark jobs fail.
|
FYI, cc @viirya , @huaxingao , @kazuyukitanimura , @sunchao |
|
Hmmm ... the change decouples If it matters, we can try with the original fix proposed (https://github.com/apache/spark/compare/25aab030e562153dbf7d11e8d2dadd8fd10ecc50..242ff01536c6e8307224256faa5e3d197e780da9). For a bit of context:
So, it would be helpful to know how this patch affected some PySpark jobs so we can properly fix them in master branch and other branches too. |
Do you really mean that removing Or do you mean that the removal of the |
|
Thank you for the replies, @HyukjinKwon and @soxofaan . Please forget about my previous comments. After diggiing more about the PySpark 3.2 apps, it turns out that it was like a false alarm. Initially, there was a concern about the side-effect of the removed |
|
Thanks for confirming! |
See https://issues.apache.org/jira/browse/SPARK-41989 for in depth explanation Short summary: `pyspark/pandas/__init__.py` uses, at import time, `logging.warning()` which might silently call `logging.basicConfig()`. So by importing `pyspark.pandas` (directly or indirectly) a user might unknowingly break their own logging setup (e.g. when based on `logging.basicConfig()` or related). `logging.getLogger(...).warning()` does not trigger this behavior. User-defined logging setups will be more predictable. Manual testing so far. I'm not sure it's worthwhile to cover this with a unit test Closes apache#39516 from soxofaan/SPARK-41989-pyspark-pandas-logging-setup. Authored-by: Stefaan Lippens <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 04836ba) Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
See https://issues.apache.org/jira/browse/SPARK-41989 for in depth explanation
Short summary:
pyspark/pandas/__init__.pyuses, at import time,logging.warning()which might silently calllogging.basicConfig().So by importing
pyspark.pandas(directly or indirectly) a user might unknowingly break their own logging setup (e.g. when based onlogging.basicConfig()or related).logging.getLogger(...).warning()does not trigger this behavior.Does this PR introduce any user-facing change?
User-defined logging setups will be more predictable.
How was this patch tested?
Manual testing so far.
I'm not sure it's worthwhile to cover this with a unit test