[SPARK-12042] Python API for mllib.stat.test.StreamingTest #11374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

yinxusen wants to merge 7 commits into apache:master from yinxusen:SPARK-12042

Contributor

yinxusen commented Feb 25, 2016

What changes were proposed in this pull request?

The patch adds python API for mllib.stat.test.StreamingTest under JIRA https://issues.apache.org/jira/browse/SPARK-12042.

Note that for StreamingTestResult, unlike other test results in Python, I define it as a normal Python class which doesn't extend from TestResult with a _java_obj in it.

How was this patch tested?

The patch is tested with Python unit test.

yinxusen added 4 commits

February 24, 2016 22:42


          A draft and runnable version

6867a89


          treat StreamingTestResult as an independent class

079a873


          add test for streamingtest

f70d7aa


          refine test

770703b

Contributor

mengxr commented Feb 25, 2016

cc: @feynmanliang

SparkQA commented Feb 25, 2016

Test build #51989 has finished for PR 11374 at commit 770703b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yinxusen added 2 commits

March 5, 2016 14:46


          Merge branch 'master' into SPARK-12042

ff9932b


          remove since from class header

e4e8d5e

Contributor Author

yinxusen commented Mar 5, 2016

test it please

SparkQA commented Mar 6, 2016

Test build #52523 has finished for PR 11374 at commit e4e8d5e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Contributor Author

yinxusen commented Mar 6, 2016

cc @feynmanliang

SparkQA commented May 14, 2016

Test build #58598 has finished for PR 11374 at commit e4e8d5e.

This patch fails R style tests.
This patch does not merge cleanly.
This patch adds no public classes.

Member

zsxwing commented Oct 24, 2016

Any updates to this PR?


          merge with master

615fbbb

SparkQA commented Oct 28, 2016

Test build #67696 has finished for PR 11374 at commit 615fbbb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Contributor Author

yinxusen commented Oct 28, 2016

Ping @mengxr @feynmanliang @yanboliang

Contributor

feynmanliang commented Oct 28, 2016

I'll review this tonight

Contributor

feynmanliang commented Oct 29, 2016

Apologies for the delay, I am traveling but I'll get this done this weekend.

feynmanliang suggested changes

View reviewed changes

Contributor

feynmanliang left a comment

Did a first pass.

It's been awhile since I've looked at PySpark so I may be a bit rusty on some things.

examples/src/main/python/mllib/streaming_test_example.py

    
              """

              Create a DStream that contains several RDDs to show the StreamingTest of PySpark.

              """

Contributor

feynmanliang Oct 29, 2016

Seems like other examples are including a from __future__ import print_function here

examples/src/main/python/mllib/streaming_test_example.py

    
                  sc = SparkContext(appName="PythonStreamingTestExample")

                  ssc = StreamingContext(sc, 1)

                  checkpoint_path = tempfile.mkdtemp()

Contributor

feynmanliang Oct 29, 2016

Is this necessary?

examples/src/main/python/mllib/streaming_test_example.py

    
              from pyspark.mllib.stat.test import BinarySample, StreamingTest

              if __name__ == "__main__":

Contributor

feynmanliang Oct 29, 2016

nit: don't include newline here

examples/src/main/python/mllib/streaming_test_example.py

    
              from pyspark import SparkContext

              from pyspark.streaming import StreamingContext

              from pyspark.mllib.stat.test import BinarySample, StreamingTest

Contributor

feynmanliang Oct 29, 2016

$example on$ and $example off appear to be used in other examples, though I'm not sure why myself

examples/src/main/python/mllib/streaming_test_example.py

    
                  ssc.checkpoint(checkpoint_path)

                  # Create the queue through which RDDs can be pushed to a QueueInputDStream.

                  rdd_queue = []

Contributor

feynmanliang Oct 29, 2016

use camelCase to be consistent with other examples

python/pyspark/mllib/tests.py

    
                      """

                      checkpoint_path = tempfile.mkdtemp()

                      self.ssc.checkpoint(checkpoint_path)

Contributor

feynmanliang Oct 29, 2016

Is this necessary?

python/pyspark/mllib/tests.py

    
                      input_stream = self.ssc.queueStream(rdd_queue)

                      model = StreamingTest()

                      model.setPeacePeriod(1)

Contributor

feynmanliang Oct 29, 2016

Can we break this into another test just for model params like

spark/python/pyspark/mllib/tests.py

Line 1165 in 39e2bad

def test_model_params(self):

?

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala

    
                  }

                }

                private[python] class StreamingTestResultPickler extends BasePickler[StreamingTestResult] {

Contributor

feynmanliang Oct 29, 2016

Do we need to test these in PythonMLLibAPISuite?

python/pyspark/mllib/stat/test.py

    
                      streamingTest.setTestMethod(self._testMethod)

                      javaDStream = sc._jvm.SerDe.pythonToJava(data._jdstream, True)

                      testResult = streamingTest.registerStream(javaDStream)

Contributor

feynmanliang Oct 29, 2016

Why do we need pythonToJava and javaToPython; its not used for streaming K means

spark/python/pyspark/mllib/clustering.py

Line 773 in 39e2bad

updatedModel = callMLlibFunc(

mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala

    
               */

              @Since("1.6.0")

              private[stat] class StreamingTestResult @Since("1.6.0") (

              class StreamingTestResult @Since("1.6.0") (

Contributor

feynmanliang Oct 29, 2016

Does this need to be public? Java API doesn't seem to require it

Member

HyukjinKwon commented Feb 9, 2017

Hi @yinxusen, are you able to proceed this further? If not, it seems it might be better closed for now.

HyukjinKwon mentioned this pull request

[BUILD] Close stale PRs #16937

Closed

asfgit closed this in

ed338f7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet