[SQL][WIP][Test] Supports object-based aggregation function which can store arbitrary objects in aggregation buffer. #14723

clockfly · 2016-08-19T21:04:57Z

What changes were proposed in this pull request

This PR allows user to define an AggregateFunction which can store arbitrary Java objects
in aggregation buffer, and use the Java object to do aggregation. Before this PR, user are only allowed to store a limited set of object type in aggregation buffer. Please see example usage at
class org.apache.spark.sql.AggregateWithObjectAggregateBufferSuite.MaxWithObjectAggregateBuffer

How was this patch tested?

Unit tests.

Revert: object aggregation buffer

SparkQA · 2016-08-19T23:13:01Z

Test build #64103 has finished for PR 14723 at commit 520a17f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait WithObjectAggregateBuffer

yhuai · 2016-08-20T23:09:54Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala

+ *  1. Spark framework moves on to next group, until all groups have been processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>


Semes we do not really need this line.

I guess having this line will make this trait hard to be used in Java.

oh, seems this trait will be still an java interface. But, I think in general, we do not really need to have this line.

yhuai · 2016-08-21T00:10:22Z

Can you create a jira?

cloud-fan · 2016-08-22T02:50:45Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala

+ *    object A with current inputRow. After updating, object A is stored back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call method
+ *    `serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to serialize object A
+ *    to a serializable format in place.


to a Spark SQL internal format(mostly BinaryType) in place

SparkQA · 2016-08-22T06:19:22Z

Test build #64179 has finished for PR 14723 at commit 9b16f89.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-22T06:24:18Z

Test build #64180 has finished for PR 14723 at commit 9ae648c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-08-22T08:05:09Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala

 }
+
+/**
+ * This traits allows an AggregateFunction to store **arbitrary** Java objects in internal


Nit: traits => trait

clockfly · 2016-08-22T15:47:45Z

@liancheng @cloud-fan
@yhuai @hvanhovell @gatorsmile
This PR is superceded by #14753, please review the new PR instead.

The motivation behind the change is that the aggregation function is also used by WindowExec, which may do continous update and eval. We have to override eval of ImperativeAggregate so that eval can accepts an aggregation buffer which contains generic Java object.

For example:

agg.update(buffer, row1)
agg.eval(buffer)
agg.update(buffer, row2)
agg.eal(buffer)

object aggregation buffer

520a17f

Revert: object aggregation buffer

yhuai reviewed Aug 20, 2016
View reviewed changes

cloud-fan reviewed Aug 22, 2016
View reviewed changes

fix review comments

9ae648c

clockfly force-pushed the object_aggregation_buffer_part1 branch from 9b16f89 to 9ae648c Compare August 22, 2016 06:18

liancheng reviewed Aug 22, 2016
View reviewed changes

clockfly closed this Aug 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SQL][WIP][Test] Supports object-based aggregation function which can store arbitrary objects in aggregation buffer. #14723

[SQL][WIP][Test] Supports object-based aggregation function which can store arbitrary objects in aggregation buffer. #14723

Uh oh!

clockfly commented Aug 19, 2016

Uh oh!

SparkQA commented Aug 19, 2016

Uh oh!

yhuai Aug 20, 2016

Uh oh!

yhuai Aug 20, 2016

Uh oh!

yhuai Aug 20, 2016

Uh oh!

yhuai commented Aug 21, 2016

Uh oh!

cloud-fan Aug 22, 2016

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

liancheng Aug 22, 2016

Uh oh!

clockfly commented Aug 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[SQL][WIP][Test] Supports object-based aggregation function which can store arbitrary objects in aggregation buffer. #14723

[SQL][WIP][Test] Supports object-based aggregation function which can store arbitrary objects in aggregation buffer. #14723

Uh oh!

Conversation

clockfly commented Aug 19, 2016

What changes were proposed in this pull request

How was this patch tested?

Uh oh!

SparkQA commented Aug 19, 2016

Uh oh!

yhuai Aug 20, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Aug 20, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Aug 20, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai commented Aug 21, 2016

Uh oh!

cloud-fan Aug 22, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

liancheng Aug 22, 2016

Choose a reason for hiding this comment

Uh oh!

clockfly commented Aug 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants