-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11923][ML] Python API for ml.feature.ChiSqSelector #10186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #47306 has finished for PR 10186 at commit
|
|
Test build #47307 has finished for PR 10186 at commit
|
python/pyspark/ml/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove #
|
Looks good except minor issues. |
python/pyspark/ml/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we probably don't need the "#"s in the pydoc
|
Thanks for comments @holdenk and @yanboliang. It's so strange that I cannot see comments from @yanboliang in this page. It must be a Github issue. I don't know whether we can catch up for 1.6? If not, I'll change the tag into 1.7 later. |
|
Test build #47462 has finished for PR 10186 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model is loadable and saveable in Java, I don't see us doing this elsewhere in ml/ yet (although we do it in mllib/) but do we maybe want to use the JavaLoader & JavaSaveable base classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model persistence is important in PySpark, but there is no need to add it in this PR. @yanboliang has a JIRA for adding pipeline persistence in PySpark: https://issues.apache.org/jira/browse/SPARK-11939
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the selectedFeatures method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
|
LGTM |
|
LGTM cc @jkbradley |
|
Change the version to 2.2.0 |
|
Test build #49141 has finished for PR 10186 at commit
|
python/pyspark/ml/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent 1 more space (this line + next line)
|
Thanks for the PR! I only had a couple more comments. |
|
test it please |
|
Test build #49243 has finished for PR 10186 at commit
|
python/pyspark/ml/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @jkbradley
I use a javaSelectedFeatures because I find that if I use self._call_java("selectedFeatures"), it returns a array('i', [3]), which is strange since the result should be [3]. I doubt that there is something wrong in SerDe.dumps(javaObject) in Scala side then deserialize it in Python side with Scala Array.
|
@jkbradley I also find an inconsistency returning value so I leave a JIRA here: https://issues.apache.org/jira/browse/SPARK-12780 |
|
And what's more, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed for Java, so I'd make it a private API. But hopefully we can remove this altogether once [https://github.com//pull/10724] gets merged.
|
@yinxusen Thanks! Let's get your other PR in first, and then update this PR. |
|
@jkbradley Yes, sure. |
python/pyspark/ml/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley I learn from @holdenk' PR #10085, we can transform the JavaArray directly with list in Python. So there is no need to call _call_java().
|
Test build #49453 has finished for PR 10186 at commit
|
|
See comment on [https://github.com//pull/10724]. We'll return to this PR after [https://github.com//pull/10772] gets merged. |
|
test it please |
|
Test build #50112 has finished for PR 10186 at commit
|
|
@jkbradley Another PR related to #10772. |
|
LGTM |
https://issues.apache.org/jira/browse/SPARK-11923