-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-34819][SQL] MapType supports comparable semantics #32552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138562 has finished for PR 32552 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138570 has finished for PR 32552 at commit
|
7d6ab65 to
539a1e6
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138573 has finished for PR 32552 at commit
|
|
Test build #138575 has finished for PR 32552 at commit
|
3c8b19a to
d08942f
Compare
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138579 has finished for PR 32552 at commit
|
d22a6e1 to
38e42c4
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #140384 has finished for PR 32552 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #140411 has finished for PR 32552 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #140889 has finished for PR 32552 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #141123 has finished for PR 32552 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test status success |
|
Test build #141144 has finished for PR 32552 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
Folks, what is the state of this PR? Do we expect to make progress on this? |
What changes were proposed in this pull request?
This PR proposes to support comparable semantics for map types.
NOTE: This PR is the rework of #31967(@WangGuangxin)/#15970(@hvanhovell).
The approach of the PR is similar to
NormalizeFloatingNumbersand it has the same restriction; in the plan optimizing phase, a new rule namedNormalizeMapsinserts an expressionSortMapKeysto make sure two maps having the same key value pairs but with different key ordering are equal (e.g., Map('a' -> 1, 'b' -> 2) should equal to Map('b' -> 2, 'a' -> 1). As for aggregates, this rule is applied in the physical planning phase because all the grouping exprs are not extracted during the logical phase (This is the same restriction withNormalizeFloatingNumbers).The major differences from
NormalizeFloatingNumbersare as follows;EqualTo,GreaterThan, ...) andIn/InSetin a plan (NormalizeFloatingNumbersis applied only into theEqualTocomparison in a join plan, an equi-join).normalizerecursively and just adds aSortMapKeysexpr just on each top-level expr (e.g., top-level grouping expr and left/right side expr of binary comparisons).SortOrders in sort-related plans.For sorting map entries, I reused the array ordering logic (See:
MapType.compareandCodegenContext.genComp) because keys and values in map entries follow the array format; it checks if key arrays in two maps are the same first, an then check if value arrays are the same.NOTE: Adding duplicate
SortMapKeysexprs in a binary comparison tree is a known issue; for example, in a query below,MapType's column,a, is sorted twice;But, I don't have a smart idea to avoid it in this PR for now. Probably, I think common subexpression elimination in filter plans can solve it, but Spark does not have the optimization now. (Fro more details, see the previous @viirya PR: #30565).
Why are the changes needed?
To improve map usability.
Does this PR introduce any user-facing change?
Yes, a user can use map-typed data in GROUP BY, ORDER BY, and PARTITION BY in WINDOW clauses.
How was this patch tested?
Add unit tests.