Skip to content

Conversation

@MechCoder
Copy link
Contributor

It is useful to have functionality that can parse a string into a LabeledPoint while loading files, etc

@MechCoder MechCoder changed the title [SPARK-8291] Add parse functionality to LabeledPoint in PySpark [SPARK-8291] [MLlib] [PySpark] Add parse functionality to LabeledPoint in PySpark Jun 10, 2015
@MechCoder
Copy link
Contributor Author

cc: @srowen @brkyvz Can you please have a look at this?

It will be useful in this example. https://github.com/apache/spark/pull/6744/files#diff-c433cb6c11f49d555c3741b41ffb7eecR790

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34600 has finished for PR 6746 at commit 6fd40a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder
Copy link
Contributor Author

@davies Could you have a look at this also? :P

@davies
Copy link
Contributor

davies commented Jun 18, 2015

There could be many different formats in a string, so I'm not sure which format we should support. It's better to be done by user.

cc @mengxr

@MechCoder
Copy link
Contributor Author

Hmm. The supported format in this PR is coherent with that done in Scala.

@davies
Copy link
Contributor

davies commented Jun 18, 2015

I see, in order to have similar functionality as Scala, we should do similar things as in #685, having MLUtils.loadLabelPoints() ...

@davies
Copy link
Contributor

davies commented Jun 18, 2015

Also the format we used here, should match with LabelPoint.str(), then it can load the RDD after saveAsTextFile.

@MechCoder
Copy link
Contributor Author

I have added tests to verify it.

from pyspark.mllib.regression import LabeledPoint
lb = LabeledPoint(2, [0.1, 1.2, 3.4])
rdd = sc.parallelize([lb, lb, lb])
rdd.saveAsTextFile("tmp")
sc.textFile("tmp").map(LabeledPoint.parse)

@MechCoder
Copy link
Contributor Author

MLUtils.LabeledPoints is just a wrapper around the scala code right? Were you trying to infer that this also should be a wrapper around the parse method in Scala? Is that required for something as simple as this?

@MechCoder
Copy link
Contributor Author

@mengxr Should I close this then?

@MechCoder MechCoder closed this Jul 1, 2015
@MechCoder MechCoder deleted the parse_labeled_point branch July 1, 2015 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants