-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SQL][WIP][Test] Supports object-based aggregation function which can store arbitrary objects in aggregation buffer. #14723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Revert: object aggregation buffer
|
Test build #64103 has finished for PR 14723 at commit
|
| * 1. Spark framework moves on to next group, until all groups have been processed. | ||
| */ | ||
| trait WithObjectAggregateBuffer { | ||
| this: ImperativeAggregate => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semes we do not really need this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess having this line will make this trait hard to be used in Java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, seems this trait will be still an java interface. But, I think in general, we do not really need to have this line.
|
Can you create a jira? |
| * object A with current inputRow. After updating, object A is stored back to mutableAggBuffer. | ||
| * 1. After processing all rows of current group, the framework will call method | ||
| * `serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to serialize object A | ||
| * to a serializable format in place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to a Spark SQL internal format(mostly BinaryType) in place
9b16f89 to
9ae648c
Compare
|
Test build #64179 has finished for PR 14723 at commit
|
|
Test build #64180 has finished for PR 14723 at commit
|
| } | ||
|
|
||
| /** | ||
| * This traits allows an AggregateFunction to store **arbitrary** Java objects in internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: traits => trait
|
@liancheng @cloud-fan The motivation behind the change is that the aggregation function is also used by WindowExec, which may do continous For example: |
What changes were proposed in this pull request
This PR allows user to define an AggregateFunction which can store arbitrary Java objects
in aggregation buffer, and use the Java object to do aggregation. Before this PR, user are only allowed to store a limited set of object type in aggregation buffer. Please see example usage at
class
org.apache.spark.sql.AggregateWithObjectAggregateBufferSuite.MaxWithObjectAggregateBufferHow was this patch tested?
Unit tests.