[WIP][SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #12217

jkbradley · 2016-04-06T20:57:14Z

What changes were proposed in this pull request?

Previously, RDD.treeAggregate used reduceByKey and reduce in its implementation, neither of which technically allows the seq/combOps to modify and return their first arguments.

This PR uses foldByKey and fold instead and notes that aggregate and treeAggregate are semantically identical in the Scala doc.

How was this patch tested?

Existing unit tests

jkbradley · 2016-04-06T20:59:07Z

CC: @srowen @mengxr

SparkQA · 2016-04-06T22:43:24Z

Test build #55151 has finished for PR 12217 at commit 93e3cb3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-07T08:11:45Z

LGTM pending tests

srowen · 2016-04-07T08:11:51Z

Jenkins retest this please

SparkQA · 2016-04-07T10:11:16Z

Test build #55204 has finished for PR 12217 at commit 93e3cb3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-04-07T18:49:15Z

Maybe there is something going on here...investigating

jkbradley · 2016-04-08T00:11:56Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

      }
-      partiallyAggregated.reduce(cleanCombOp)
+      //partiallyAggregated.reduce(cleanCombOp)
+      // This fails:


Does anyone see why fold would fail, whereas reduce succeeds?

SparkQA · 2016-04-08T00:14:03Z

Test build #55281 has finished for PR 12217 at commit 16e79e3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-04-08T01:38:50Z

Apparently it was because of zeroValue being used in multiple places without making a copy.

Is this worth committing? I feel like a better solution is to add docs + a unit test to RDD.reduce saying that the combine operation can modify and return the first element. If people agree, I'll do that instead.

SparkQA · 2016-04-08T02:17:27Z

Test build #55280 has finished for PR 12217 at commit 0d05b96.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-08T03:42:35Z

Test build #55296 has finished for PR 12217 at commit 02d107a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-08T12:23:18Z

@jkbradley I see what you mean, but, thinking about it, can this work? the reduce function is applied directly to RDD elements, so modifying one of the arguments and returning it means you're mutating the elements of the RDD in memory, which may have some undefined consequences. For fold, it's fine because the left argument is always actually the zero-value object. Right? Or am I not thinking about it correctly. It might happen to be fine to use reduce in some cases where the RDD values are not used again.

jkbradley · 2016-04-08T22:38:57Z

I agree, but am not quite sure how these things work b/c of the serialization across tasks. I'll ping others who might know more than I do.

SparkQA · 2016-04-09T00:19:15Z

Test build #2771 has finished for PR 12217 at commit 02d107a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-04-13T21:58:33Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

      }
-      partiallyAggregated.reduce(cleanCombOp)
+      val copiedZeroValue = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())
+      partiallyAggregated.fold(copiedZeroValue)(cleanCombOp)


It's this line which makes AFTSurvivalRegression fail. Not sure why...

@jkbradley Is it because the code uses zeroValue, possibly modifying it, before you copy it? what about copying it before line 1085?

I tried making one copy for each use of zeroValue at the beginning of the method, but that didn't fix the AFT test failures.

Hm OK if you've got a copy for each of the 3 usages, that really can't be it. Unless the clone isn't implemented as a deep clone for the object in question somehow. Could it be due to a different order of applying the combOp in this case? that's the only other thing I can think of if this change alone is the issue.

The ordering of the combOp really shouldn't matter for AFT. I feel like it must be some esoteric closure issue.

jkbradley · 2016-04-13T21:59:17Z

I'll leave this open a bit in hopes the RDD experts can take a look....

SparkQA · 2016-04-13T22:42:04Z

Test build #2784 has finished for PR 12217 at commit 02d107a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-05-11T12:20:33Z

I remember I took a look for this (in the last time while looking at stale PRs) and I remember I had no idea as well ... @jkbradley I was just wondering if we should better leave this closed rather then open?

HyukjinKwon · 2017-06-02T13:58:03Z

@NathanHowell, do you maybe have any idea on this (sorry, probably wrong person to cc but I know no one I could think ... )?

HyukjinKwon · 2017-06-02T14:01:40Z

Hi @jkbradley and @srowen, could we retest this just to see the error messages? It looks the last test results are not accessible (to me).

NathanHowell · 2017-06-02T15:31:23Z

Nothing looks obviously broken, their combiner looks fine. Rerunning the tests would help.

…

On Jun 2, 2017 07:02, "Hyukjin Kwon" ***@***.***> wrote: Hi @jkbradley <https://github.com/jkbradley> and @srowen <https://github.com/srowen>, could we retest this just to see the error messages? It looks the last test results are not accessible (to me). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12217 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKbTYW4U9nMQhZ3uGwnF-p7aYOmEAU8ks5sABXrgaJpZM4IBfi3> .

SparkQA · 2017-06-02T16:07:24Z

Test build #3773 has finished for PR 12217 at commit 02d107a.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-06-03T00:49:15Z

The tests in AFTSurvivalRegressionSuite was being failed. treeAggregate was being called for AFT cost.

 - aft survival regression: default params
0.0 equaled 0.0
ScalaTestFailureLocation: org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3 at (AFTSurvivalRegressionSuite.scala:89)
org.scalatest.exceptions.TestFailedException: 0.0 equaled 0.0
	at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
	at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
	at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:89)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3.apply(AFTSurvivalRegressionSuite.scala:66)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3.apply(AFTSurvivalRegressionSuite.scala:66)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

- aft survival regression with univariate
0.0 or 1.759 is extremely close to zero, so the relative tolerance is meaningless.
ScalaTestFailureLocation: org.apache.spark.mllib.util.TestingUtils$ at (TestingUtils.scala:41)
org.scalatest.exceptions.TestFailedException: 0.0 or 1.759 is extremely close to zero, so the relative tolerance is meaningless.
	at org.apache.spark.mllib.util.TestingUtils$.org$apache$spark$mllib$util$TestingUtils$$RelativeErrorComparison(TestingUtils.scala:41)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:78)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$4.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:162)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$4.apply(AFTSurvivalRegressionSuite.scala:127)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$4.apply(AFTSurvivalRegressionSuite.scala:127)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

- aft survival regression with multivariate
0.0 or 1.9206 is extremely close to zero, so the relative tolerance is meaningless.
ScalaTestFailureLocation: org.apache.spark.mllib.util.TestingUtils$ at (TestingUtils.scala:41)
org.scalatest.exceptions.TestFailedException: 0.0 or 1.9206 is extremely close to zero, so the relative tolerance is meaningless.
	at org.apache.spark.mllib.util.TestingUtils$.org$apache$spark$mllib$util$TestingUtils$$RelativeErrorComparison(TestingUtils.scala:41)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:78)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$5.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:233)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$5.apply(AFTSurvivalRegressionSuite.scala:196)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$5.apply(AFTSurvivalRegressionSuite.scala:196)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

- aft survival regression w/o intercept
0.0 or 0.896 is extremely close to zero, so the relative tolerance is meaningless.
ScalaTestFailureLocation: org.apache.spark.mllib.util.TestingUtils$ at (TestingUtils.scala:41)
org.scalatest.exceptions.TestFailedException: 0.0 or 0.896 is extremely close to zero, so the relative tolerance is meaningless.
	at org.apache.spark.mllib.util.TestingUtils$.org$apache$spark$mllib$util$TestingUtils$$RelativeErrorComparison(TestingUtils.scala:41)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq(TestingUtils.scala:66)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2$$anonfun$apply$2.apply(TestingUtils.scala:167)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2$$anonfun$apply$2.apply(TestingUtils.scala:167)
	at scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
	at scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
	at scala.collection.mutable.ArrayOps$ofRef.forall(ArrayOps.scala:186)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2.apply(TestingUtils.scala:167)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2.apply(TestingUtils.scala:166)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:134)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$6.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:304)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$6.apply(AFTSurvivalRegressionSuite.scala:266)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$6.apply(AFTSurvivalRegressionSuite.scala:266)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Let me give a shot to provide a minimal reproduction after rebasing it with master.

HyukjinKwon · 2017-06-03T01:35:19Z

Uh.... wait. It actually pass the tests after updating this with the current master ... and even I fixed this before - e355460. I double checked that it fails the tests before and it passes the tests after this commit.

I think it was a bug about AFTAggregator not this change. @jkbradley, could we just fix the Javadoc8 error and rebase? I think now it should pass the Jenkins build. If you are busy, I can pick this up.

HyukjinKwon · 2017-06-03T01:40:46Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

   *
   * @param depth suggested depth of the tree (default: 2)
-   * @see [[org.apache.spark.rdd.RDD#aggregate]]
+   * @see [[org.apache.spark.rdd.RDD#aggregate]] These two methods have identical semantics.


Just to help ... I believe the actual Javadoc errors look ...

[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/core/target/java/org/apache/spark/rdd/RDD.java:660: error: unexpected content [error] * @see {@link org.apache.spark.rdd.RDD#aggregate} These two methods have identical semantics. [error] ^

…reduce ## What changes were proposed in this pull request? Previously, `RDD.treeAggregate` used `reduceByKey` and `reduce` in its implementation, neither of which technically allows the `seq`/`combOps` to modify and return their first arguments. This PR uses `foldByKey` and `fold` instead and notes that `aggregate` and `treeAggregate` are semantically identical in the Scala doc. Note that this had some test failures by unknown reasons. This was actually fixed in e355460. The root cause was, the `zeroValue` now becomes `AFTAggregator` and it compares `totalCnt` (where the value is actually 0). It starts merging one by one and it keeps returning `this` where `totalCnt` is 0. So, this looks not the bug in the current change. This is now fixed in the commit. So, this should pass the tests. ## How was this patch tested? Test case added in `RDDSuite`. Closes #12217 Author: Joseph K. Bradley <[email protected]> Author: hyukjinkwon <[email protected]> Closes #18198 from HyukjinKwon/SPARK-14408.

Changed RDD.treeAggregate to use fold instead of reduce

93e3cb3

Still testing treeAggregate implementations

0d05b96

jkbradley changed the title ~~[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce~~ [WIP][SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce Apr 8, 2016

jkbradley added 2 commits April 7, 2016 17:07

fixed bug in treeAgg test

7fcc10d

Fixed incorrect statement about failure

16e79e3

jkbradley reviewed Apr 8, 2016
View reviewed changes

Fixed bug in treeAggregate using fold

02d107a

jkbradley reviewed Apr 13, 2016
View reviewed changes

HyukjinKwon reviewed Jun 3, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Jun 5, 2017

[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

Closed

srowen mentioned this pull request Jun 7, 2017

[INFRA] Close stale PRs #18223

Closed

asfgit closed this in b771fed Jun 8, 2017

[WIP][SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #12217

[WIP][SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #12217

Uh oh!

Conversation

jkbradley commented Apr 6, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

jkbradley commented Apr 6, 2016

Uh oh!

SparkQA commented Apr 6, 2016

Uh oh!

srowen commented Apr 7, 2016

Uh oh!

srowen commented Apr 7, 2016

Uh oh!

SparkQA commented Apr 7, 2016

Uh oh!

jkbradley commented Apr 7, 2016

Uh oh!

jkbradley Apr 8, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

jkbradley commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

srowen commented Apr 8, 2016

Uh oh!

jkbradley commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 9, 2016

Uh oh!

jkbradley Apr 13, 2016

Choose a reason for hiding this comment

Uh oh!

srowen Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

srowen Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

HyukjinKwon commented May 11, 2017

Uh oh!

HyukjinKwon commented Jun 2, 2017

Uh oh!

HyukjinKwon commented Jun 2, 2017

Uh oh!

NathanHowell commented Jun 2, 2017 via email

Uh oh!

SparkQA commented Jun 2, 2017

Uh oh!

HyukjinKwon commented Jun 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jun 3, 2017

Uh oh!

HyukjinKwon Jun 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

HyukjinKwon commented Jun 3, 2017 •

edited

Loading

HyukjinKwon Jun 3, 2017 •

edited

Loading