Skip to content

Conversation

@jkbradley
Copy link
Member

What changes were proposed in this pull request?

Previously, RDD.treeAggregate used reduceByKey and reduce in its implementation, neither of which technically allows the seq/combOps to modify and return their first arguments.

This PR uses foldByKey and fold instead and notes that aggregate and treeAggregate are semantically identical in the Scala doc.

How was this patch tested?

Existing unit tests

@jkbradley
Copy link
Member Author

CC: @srowen @mengxr

@SparkQA
Copy link

SparkQA commented Apr 6, 2016

Test build #55151 has finished for PR 12217 at commit 93e3cb3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 7, 2016

LGTM pending tests

@srowen
Copy link
Member

srowen commented Apr 7, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55204 has finished for PR 12217 at commit 93e3cb3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

Maybe there is something going on here...investigating

@jkbradley jkbradley changed the title [SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce [WIP][SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce Apr 8, 2016
}
partiallyAggregated.reduce(cleanCombOp)
//partiallyAggregated.reduce(cleanCombOp)
// This fails:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone see why fold would fail, whereas reduce succeeds?

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55281 has finished for PR 12217 at commit 16e79e3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

Apparently it was because of zeroValue being used in multiple places without making a copy.

Is this worth committing? I feel like a better solution is to add docs + a unit test to RDD.reduce saying that the combine operation can modify and return the first element. If people agree, I'll do that instead.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55280 has finished for PR 12217 at commit 0d05b96.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55296 has finished for PR 12217 at commit 02d107a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 8, 2016

@jkbradley I see what you mean, but, thinking about it, can this work? the reduce function is applied directly to RDD elements, so modifying one of the arguments and returning it means you're mutating the elements of the RDD in memory, which may have some undefined consequences. For fold, it's fine because the left argument is always actually the zero-value object. Right? Or am I not thinking about it correctly. It might happen to be fine to use reduce in some cases where the RDD values are not used again.

@jkbradley
Copy link
Member Author

I agree, but am not quite sure how these things work b/c of the serialization across tasks. I'll ping others who might know more than I do.

@SparkQA
Copy link

SparkQA commented Apr 9, 2016

Test build #2771 has finished for PR 12217 at commit 02d107a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
partiallyAggregated.reduce(cleanCombOp)
val copiedZeroValue = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())
partiallyAggregated.fold(copiedZeroValue)(cleanCombOp)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's this line which makes AFTSurvivalRegression fail. Not sure why...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley Is it because the code uses zeroValue, possibly modifying it, before you copy it? what about copying it before line 1085?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making one copy for each use of zeroValue at the beginning of the method, but that didn't fix the AFT test failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm OK if you've got a copy for each of the 3 usages, that really can't be it. Unless the clone isn't implemented as a deep clone for the object in question somehow. Could it be due to a different order of applying the combOp in this case? that's the only other thing I can think of if this change alone is the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of the combOp really shouldn't matter for AFT. I feel like it must be some esoteric closure issue.

@jkbradley
Copy link
Member Author

I'll leave this open a bit in hopes the RDD experts can take a look....

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #2784 has finished for PR 12217 at commit 02d107a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I remember I took a look for this (in the last time while looking at stale PRs) and I remember I had no idea as well ... @jkbradley I was just wondering if we should better leave this closed rather then open?

@HyukjinKwon
Copy link
Member

@NathanHowell, do you maybe have any idea on this (sorry, probably wrong person to cc but I know no one I could think ... )?

@HyukjinKwon
Copy link
Member

Hi @jkbradley and @srowen, could we retest this just to see the error messages? It looks the last test results are not accessible (to me).

@NathanHowell
Copy link

NathanHowell commented Jun 2, 2017 via email

@SparkQA
Copy link

SparkQA commented Jun 2, 2017

Test build #3773 has finished for PR 12217 at commit 02d107a.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 3, 2017

The tests in AFTSurvivalRegressionSuite was being failed. treeAggregate was being called for AFT cost.

 - aft survival regression: default params
0.0 equaled 0.0
ScalaTestFailureLocation: org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3 at (AFTSurvivalRegressionSuite.scala:89)
org.scalatest.exceptions.TestFailedException: 0.0 equaled 0.0
	at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
	at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
	at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:89)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3.apply(AFTSurvivalRegressionSuite.scala:66)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$3.apply(AFTSurvivalRegressionSuite.scala:66)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
- aft survival regression with univariate
0.0 or 1.759 is extremely close to zero, so the relative tolerance is meaningless.
ScalaTestFailureLocation: org.apache.spark.mllib.util.TestingUtils$ at (TestingUtils.scala:41)
org.scalatest.exceptions.TestFailedException: 0.0 or 1.759 is extremely close to zero, so the relative tolerance is meaningless.
	at org.apache.spark.mllib.util.TestingUtils$.org$apache$spark$mllib$util$TestingUtils$$RelativeErrorComparison(TestingUtils.scala:41)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:78)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$4.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:162)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$4.apply(AFTSurvivalRegressionSuite.scala:127)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$4.apply(AFTSurvivalRegressionSuite.scala:127)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
- aft survival regression with multivariate
0.0 or 1.9206 is extremely close to zero, so the relative tolerance is meaningless.
ScalaTestFailureLocation: org.apache.spark.mllib.util.TestingUtils$ at (TestingUtils.scala:41)
org.scalatest.exceptions.TestFailedException: 0.0 or 1.9206 is extremely close to zero, so the relative tolerance is meaningless.
	at org.apache.spark.mllib.util.TestingUtils$.org$apache$spark$mllib$util$TestingUtils$$RelativeErrorComparison(TestingUtils.scala:41)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:78)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$5.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:233)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$5.apply(AFTSurvivalRegressionSuite.scala:196)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$5.apply(AFTSurvivalRegressionSuite.scala:196)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
- aft survival regression w/o intercept
0.0 or 0.896 is extremely close to zero, so the relative tolerance is meaningless.
ScalaTestFailureLocation: org.apache.spark.mllib.util.TestingUtils$ at (TestingUtils.scala:41)
org.scalatest.exceptions.TestFailedException: 0.0 or 0.896 is extremely close to zero, so the relative tolerance is meaningless.
	at org.apache.spark.mllib.util.TestingUtils$.org$apache$spark$mllib$util$TestingUtils$$RelativeErrorComparison(TestingUtils.scala:41)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals$$anonfun$relTol$1.apply(TestingUtils.scala:106)
	at org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq(TestingUtils.scala:66)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2$$anonfun$apply$2.apply(TestingUtils.scala:167)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2$$anonfun$apply$2.apply(TestingUtils.scala:167)
	at scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
	at scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
	at scala.collection.mutable.ArrayOps$ofRef.forall(ArrayOps.scala:186)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2.apply(TestingUtils.scala:167)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals$$anonfun$relTol$2.apply(TestingUtils.scala:166)
	at org.apache.spark.mllib.util.TestingUtils$VectorWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:134)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$6.apply$mcV$sp(AFTSurvivalRegressionSuite.scala:304)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$6.apply(AFTSurvivalRegressionSuite.scala:266)
	at org.apache.spark.ml.regression.AFTSurvivalRegressionSuite$$anonfun$6.apply(AFTSurvivalRegressionSuite.scala:266)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
	at org.scalatest.Suite$class.run(Suite.scala:1424)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
	at org.scalatest.tools.Runner$.run(Runner.scala:883)
	at org.scalatest.tools.Runner.run(Runner.scala)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Let me give a shot to provide a minimal reproduction after rebasing it with master.

@HyukjinKwon
Copy link
Member

Uh.... wait. It actually pass the tests after updating this with the current master ... and even I fixed this before - e355460. I double checked that it fails the tests before and it passes the tests after this commit.

I think it was a bug about AFTAggregator not this change. @jkbradley, could we just fix the Javadoc8 error and rebase? I think now it should pass the Jenkins build. If you are busy, I can pick this up.

*
* @param depth suggested depth of the tree (default: 2)
* @see [[org.apache.spark.rdd.RDD#aggregate]]
* @see [[org.apache.spark.rdd.RDD#aggregate]] These two methods have identical semantics.
Copy link
Member

@HyukjinKwon HyukjinKwon Jun 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to help ... I believe the actual Javadoc errors look ...

[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/core/target/java/org/apache/spark/rdd/RDD.java:660: error: unexpected content
[error]    * @see {@link org.apache.spark.rdd.RDD#aggregate} These two methods have identical semantics.
[error]      ^

@srowen srowen mentioned this pull request Jun 7, 2017
@asfgit asfgit closed this in b771fed Jun 8, 2017
asfgit pushed a commit that referenced this pull request Jun 9, 2017
…reduce

## What changes were proposed in this pull request?

Previously, `RDD.treeAggregate` used `reduceByKey` and `reduce` in its implementation, neither of which technically allows the `seq`/`combOps` to modify and return their first arguments.

This PR uses `foldByKey` and `fold` instead and notes that `aggregate` and `treeAggregate` are semantically identical in the Scala doc.

Note that this had some test failures by unknown reasons. This was actually fixed in e355460.

The root cause was, the `zeroValue` now becomes `AFTAggregator` and it compares `totalCnt` (where the value is actually 0). It starts merging one by one and it keeps returning `this` where `totalCnt` is 0. So, this looks not the bug in the current change.

This is now fixed in the commit. So, this should pass the tests.

## How was this patch tested?

Test case added in `RDDSuite`.

Closes #12217

Author: Joseph K. Bradley <[email protected]>
Author: hyukjinkwon <[email protected]>

Closes #18198 from HyukjinKwon/SPARK-14408.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants