Skip to content

Conversation

@WweiL
Copy link
Contributor

@WweiL WweiL commented Sep 9, 2024

Continue the discussion from #47425 to this PR because I can't push to Yuchen's account

What changes were proposed in this pull request?

The builtin ProtoBuf connector first supports recursive schema reference. It is approached by letting users specify an option “recursive.fields.max.depth”, and at the start of the execution, unroll the recursive field by this level. It converts a problem of dynamic schema for each row to a fixed schema which is supported by Spark. Avro can just adopt a similar method. This PR defines an option "recursiveFieldMaxDepth" to both Avro data source and from_avro function. With this option, Spark can support Avro recursive schema up to certain depth.

Why are the changes needed?

Recursive reference denotes the case that the type of a field can be defined before in the parent nodes. A simple example is:

{
  "type": "record",
  "name": "LongList",
  "fields" : [
    {"name": "value", "type": "long"},
    {"name": "next", "type": ["null", "LongList"]}
  ]
}

This is written in Avro Schema DSL and represents a linked list data structure. Spark currently will throw an error on this schema. Many users used schema like this, so we should support it.

Does this PR introduce any user-facing change?

Yes. Previously, it will throw error on recursive schemas like above. With this change, it will still throw the same error by default but when users specify the option to a number greater than 0, the schema will be unrolled to that depth.

How was this patch tested?

Added new unit tests and integration tests to AvroSuite and AvroFunctionSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

Co-authored-by: Wei Liu [email protected]

@gengliangwang
Copy link
Member

@WweiL Thanks for picking this up.
Could you add one more test case for union? #47425 (comment)
I will merge this one after that

@hkulyc
Copy link

hkulyc commented Sep 17, 2024

@WweiL Thanks a lot for making the change!

@WweiL WweiL requested a review from gengliangwang September 17, 2024 20:30
@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants