Skip to content

Conversation

@JihongMA
Copy link
Contributor

Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.

JihongMA and others added 29 commits May 5, 2015 21:17
This reverts commit c40701a.
This reverts commit 3e7d889.
This reverts commit 9c84695.

Conflicts:

	docs/running-on-yarn.md
This reverts commit a399aa6.

Conflicts:

	docs/running-on-yarn.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have changed how these plug in. You'll need to change the FunctionRegistry now.

@JihongMA
Copy link
Contributor Author

Please don't test it yet, need to make change to accomodate API change introduced by other JIRA.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38399 has finished for PR 6297 at commit 87fd2dc.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class InternalRow extends Serializable
    • case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
    • case class ComputePartialStd(child: Expression) extends AggregateExpression
    • case class CombinePartialStd(child: Expression) extends AggregateExpression
    • case class ComputePartialStdFunction (
    • case class CombinePartialStdFunction(
    • case class StddevFunction(
    • class GenericRow(protected[sql] val values: Array[Any]) extends Row
    • class GenericInternalRow(protected[sql] val values: Array[Any]) extends InternalRow
    • class GenericInternalRowWithSchema(values: Array[Any], val schema: StructType)
    • class GenericMutableRow(val values: Array[Any]) extends MutableRow

@yhuai
Copy link
Contributor

yhuai commented Jul 29, 2015

@JihongMA Will you get time to implement the function based on the new API? It will be good if we can merge it before the 1.5 deadline for new features (end of this month).

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41730 has finished for PR 6297 at commit 25425ac.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(child: Expression, isSample: Boolean) extends UnaryExpression with AggregateExpression1
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41732 has finished for PR 6297 at commit f4c725c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41748 has finished for PR 6297 at commit 0902ceb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42006 has finished for PR 6297 at commit a81d0fc.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@JihongMA
Copy link
Contributor Author

JihongMA commented Sep 4, 2015

R style check failure is caused by commit of SPARK-8951

@SparkQA
Copy link

SparkQA commented Sep 6, 2015

Test build #42062 has finished for PR 6297 at commit 6035648.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should always return Double, because Sqrt() only works with Double, also other databases just return Double/float.

@SparkQA
Copy link

SparkQA commented Sep 12, 2015

Test build #42366 has finished for PR 6297 at commit 6351fc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Stddev(child: Expression) extends StddevAgg(child)
    • case class StddevPop(child: Expression) extends StddevAgg(child)
    • case class StddevSamp(child: Expression) extends StddevAgg(child)
    • abstract class StddevAgg(child: Expression) extends AlgebraicAggregate
    • abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1
    • case class Stddev(child: Expression) extends StddevAgg1(child)
    • case class StddevPop(child: Expression) extends StddevAgg1(child)
    • case class StddevSamp(child: Expression) extends StddevAgg1(child)
    • case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1
    • case class ComputePartialStdFunction (
    • case class MergePartialStd(
    • case class MergePartialStdFunction(
    • case class StddevFunction(

@davies
Copy link
Contributor

davies commented Sep 12, 2015

LGTM, merging this into master, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants