-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__ #7453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #1094 has finished for PR 7453 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these interfaces designed in UserDefinedType? If yes, can you add override here? it caches a lot of potential problems when we do refactoring down the line.
|
Test build #37588 has finished for PR 7453 at commit
|
|
Test build #37602 has finished for PR 7453 at commit
|
python/pyspark/mllib/linalg.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conflicts with / overlaps with the changes in #7476, so I guess this patch is blocking until that other one gets reviewed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
Test build #37651 has finished for PR 7453 at commit
|
python/pyspark/sql/types.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use the default __eq__?
|
Test build #37678 has finished for PR 7453 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like PyLint is complaining about a possibly undefined loop variable v at this line. If this isn't a legitimate error, then we can just add a comment to bypass that warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshRosen I think it's annoying to let PyLint report Warning as Error, should we only fail on real errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; let's see if we can update the configuration to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with putting -E to pylint, it seems even worse, lots of false-negtive errors.
Conflicts: python/pyspark/mllib/linalg.py
|
Test build #37817 has finished for PR 7453 at commit
|
|
Test build #37859 has finished for PR 7453 at commit
|
Conflicts: python/pyspark/sql/tests.py
|
Test build #38279 has finished for PR 7453 at commit
|
|
@mengxr Could you help to review this? |
|
Ping @mengxr, I think we need to get this in for 1.5.0 since it contains a bugfix for another patch. |
|
I'll make a pass tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing doc
|
Made one pass. My main comment is to keep the unit test for UDTs that work in both Python and Scala/Java. |
|
@mengxr Had kept the unit tests for Python and Scala UDT, please take another round review. |
|
LGTM |
|
Test build #38914 has finished for PR 7453 at commit
|
|
Merged into master. |
Also we could create a Python UDT without having a Scala one, it's important for Python users.
cc @mengxr @JoshRosen