Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Apr 22, 2016

What changes were proposed in this pull request?

Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis.

How was this patch tested?

Unit tests

@dbtsai
Copy link
Member Author

dbtsai commented Apr 22, 2016

Waiting #12259 to be merged.

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56754 has finished for PR 12627 at commit 3944f56.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2016

@dbtsai #12259 was merged. Could you update this PR?

@dbtsai
Copy link
Member Author

dbtsai commented Apr 29, 2016

@mengxr working on this now. Thanks.

@dbtsai dbtsai force-pushed the SPARK-14615-NewML branch from 3944f56 to 93a1c20 Compare May 3, 2016 07:21
@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57609 has finished for PR 12627 at commit 93a1c20.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57692 has finished for PR 12627 at commit 8346987.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57828 has finished for PR 12627 at commit 09a3dd8.

  • This patch fails MiMa tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented May 5, 2016

@dbtsai Would it help using implicit conversions?

@dbtsai
Copy link
Member Author

dbtsai commented May 5, 2016

@mengxr That can work, but need to import everywhere. I can give it a shot.

@mengxr
Copy link
Contributor

mengxr commented May 5, 2016

@dbtsai Please just try it with one algorithm and see which one is cleaner.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57922 has finished for PR 12627 at commit e4265ab.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #57930 has finished for PR 12627 at commit 1602f6f.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

SchemaUtils.checkColumnType(dataset.schema, $(featuresCol), new VectorUDT)
val data = dataset.select(col($(featuresCol))).rdd.map { case Row(point: Vector) => point }
val data = dataset.select(col($(featuresCol))).rdd.map { case Row(point: Vector) =>
OldVectors.fromML(point)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengxr Implicit conversion doesn't work things like those. We still need manually convert them. But I agree that some of the code can be simplified by implicit which I will push in the next commit.

@SparkQA
Copy link

SparkQA commented May 7, 2016

Test build #58040 has finished for PR 12627 at commit 82c7750.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58181 has finished for PR 12627 at commit c16d1ea.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58187 has finished for PR 12627 at commit 126e6f2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58221 has finished for PR 12627 at commit 6faec8a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58222 has finished for PR 12627 at commit 283b04a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented May 17, 2016

I'm making a pass.

@SparkQA
Copy link

SparkQA commented May 17, 2016

Test build #58692 has finished for PR 12627 at commit 9d25eba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.examples.mllib.AbstractParams
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.mllib.linalg.VectorImplicits._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorImplicits shouldn't appear in example code. I created https://issues.apache.org/jira/browse/SPARK-15363 to track it.

mengxr added 2 commits May 17, 2016 10:39
@SparkQA
Copy link

SparkQA commented May 17, 2016

Test build #58708 has finished for PR 12627 at commit 953eea7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class LabeledPoint(@Since(\"2.0.0\") label: Double, @Since(\"2.0.0\") features: Vector)

@asfgit asfgit closed this in e2efe05 May 17, 2016
@mengxr
Copy link
Contributor

mengxr commented May 17, 2016

LGTM. Merged into master and branch-2.0. This should complete the major MLlib API changes in 2.0. Thanks!

On retrospective, I think we under-estimated the amount of work required and didn't allocate enough time to make the changes before the feature freeze deadline. We should discuss the design and scope the work earlier next time.

asfgit pushed a commit that referenced this pull request May 17, 2016
… based algorithms

## What changes were proposed in this pull request?

Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis.

## How was this patch tested?

Unit tests

Author: DB Tsai <[email protected]>
Author: Liang-Chi Hsieh <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes #12627 from dbtsai/SPARK-14615-NewML.

(cherry picked from commit e2efe05)
Signed-off-by: Xiangrui Meng <[email protected]>
@dbtsai
Copy link
Member Author

dbtsai commented May 18, 2016

Thank you for everyone who involved in this work. I agree that the amount of work was underestimated, and some of them were actually hard to estimate given the issues were popped up durning the implementation. However, we should work on this kind of major changes in the beginning of release to ensure that we have enough time to address unexpected issues. Thanks again!

@dbtsai dbtsai deleted the SPARK-14615-NewML branch May 19, 2016 18:07
@HyukjinKwon
Copy link
Member

Hi @dbtsai I just happened to run some Python tests for ML and I noticed some examples related with this PR are failed:

examples/src/main/python/ml/aft_survival_regression.py
examples/src/main/python/ml/chisq_selector_example.py
examples/src/main/python/ml/dct_example.py
examples/src/main/python/ml/elementwise_product_example.py
examples/src/main/python/ml/estimator_transformer_param_example.py
examples/src/main/python/ml/pca_example.py
examples/src/main/python/ml/polynomial_expansion_example.py
examples/src/main/python/ml/simple_params_example.py
examples/src/main/python/ml/vector_assembler_example.py
examples/src/main/python/ml/vector_slicer_example.py

I see some Scala and Java examples were fixed here. So, I made a rough PR for Python examples. However, I feel a bit hesitated to submit this because I am not used to this part (but could do this based on your PR) and I feel like you know there are Python examples to fix already.

Do you mind if I ask that they were just mistakenly missed?

@viirya
Copy link
Member

viirya commented May 29, 2016

@HyukjinKwon Thanks for reporting this! I think we missed python example in this change. If you can submit your PR, that is good. If not or you feel hesitated about this, I can submit a PR to fix it.

@HyukjinKwon
Copy link
Member

@viirya Ah, thank you so much. Since I already have it on my local, I will create a followup!

asfgit pushed a commit that referenced this pull request Jun 11, 2016
…tor and Matrix APIs in the ML pipeline based algorithms

## What changes were proposed in this pull request?

This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms.

I firstly executed this shell command, `grep -r "from pyspark.mllib" .` and then executed them all.
Some of tests in `ml` produced the error messages as below:

```
pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDTf71b0bce.'
```

So, I fixed them to use new ones just identically with some Python tests fixed in #12627

## How was this patch tested?

Manually tested for all the examples listed by `grep -r "from pyspark.mllib" .`.

Author: hyukjinkwon <[email protected]>

Closes #13393 from HyukjinKwon/SPARK-14615.

(cherry picked from commit 99f3c82)
Signed-off-by: Joseph K. Bradley <[email protected]>
asfgit pushed a commit that referenced this pull request Jun 22, 2016
[SPARK-14615](https://issues.apache.org/jira/browse/SPARK-14615) and #12627 changed `spark.ml` pipelines to use the new `ml.linalg` classes for `Vector`/`Matrix`. Some `Since` annotations for public methods/vals have not been updated accordingly to be `2.0.0`. This PR updates them.

## How was this patch tested?

Existing unit tests.

Author: Nick Pentreath <[email protected]>

Closes #13840 from MLnick/SPARK-16127-ml-linalg-since.
asfgit pushed a commit that referenced this pull request Jun 22, 2016
[SPARK-14615](https://issues.apache.org/jira/browse/SPARK-14615) and #12627 changed `spark.ml` pipelines to use the new `ml.linalg` classes for `Vector`/`Matrix`. Some `Since` annotations for public methods/vals have not been updated accordingly to be `2.0.0`. This PR updates them.

## How was this patch tested?

Existing unit tests.

Author: Nick Pentreath <[email protected]>

Closes #13840 from MLnick/SPARK-16127-ml-linalg-since.

(cherry picked from commit 18faa58)
Signed-off-by: Xiangrui Meng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants