Skip to content

Conversation

@jkbradley
Copy link
Member

This is to fix a long-time annoyance: Whenever we add a new algorithm to pyspark.ml, we have to add it to the __all__ list at the top. Since we keep it alphabetized, it often creates a lot more changes than needed. It is also easy to add the Estimator and forget the Model. I'm going to switch it to have one algorithm per line.

This also alphabetizes a few out-of-place classes in pyspark.ml.feature. No changes have been made to the moved classes.

CC: @thunterdb

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50121 has finished for PR 10927 at commit bb0f1ef.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol):
    • class ChiSqSelectorModel(JavaModel):
    • class PCA(JavaEstimator, HasInputCol, HasOutputCol):
    • class PCAModel(JavaModel):
    • class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol):
    • class RFormulaModel(JavaModel):

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #50123 has finished for PR 10927 at commit 639a562.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Feb 2, 2016

LGTM but it has merge conflicts with master now.

@jkbradley jkbradley force-pushed the ml-python-all-list branch from 639a562 to 12b15fb Compare March 2, 2016 00:32
@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52270 has finished for PR 10927 at commit 12b15fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Mar 2, 2016

Merged into master. Thanks!

@asfgit asfgit closed this in 9495c40 Mar 2, 2016
@jkbradley jkbradley deleted the ml-python-all-list branch March 8, 2016 18:52
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
This is to fix a long-time annoyance: Whenever we add a new algorithm to pyspark.ml, we have to add it to the ```__all__``` list at the top.  Since we keep it alphabetized, it often creates a lot more changes than needed.  It is also easy to add the Estimator and forget the Model.  I'm going to switch it to have one algorithm per line.

This also alphabetizes a few out-of-place classes in pyspark.ml.feature.  No changes have been made to the moved classes.

CC: thunterdb

Author: Joseph K. Bradley <[email protected]>

Closes apache#10927 from jkbradley/ml-python-all-list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants