Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Sep 15, 2017

What changes were proposed in this pull request?

If there are two projects like as follows.

Project [a_with_metadata#27 AS b#26]
+- Project [a#0 AS a_with_metadata#27]
   +- LocalRelation <empty>, [a#0, b#1]

Child Project has an output column with a metadata in it, and the parent Project has an alias that implicitly forwards the metadata. So this metadata is visible for higher operators. Upon applying CollapseProject optimizer rule, the metadata is not preserved.

Project [a#0 AS b#26]
+- LocalRelation <empty>, [a#0, b#1]

This is incorrect, as downstream operators that expect certain metadata (e.g. watermark in structured streaming) to identify certain fields will fail to do so. This PR fixes it by preserving the metadata of top-level aliases.

How was this patch tested?

New unit test

@marmbrus
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Sep 15, 2017

Test build #81801 has finished for PR 19240 at commit b3e41a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 8866174 Sep 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants