-
Notifications
You must be signed in to change notification settings - Fork 92
GH-891: Add ExtensionTypeWriterFactory to TransferPair #892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
67334a6 to
7eba2c1
Compare
|
Hello, @lidavidm! Could you take a look at this PR? Also, I don't have permissions to change the label |
|
@jhrotko I will take a look on this one as soon as the CI is green (it should be good very soon). |
vector/src/main/java/org/apache/arrow/vector/util/TransferPairWithExtendedType.java
Outdated
Show resolved
Hide resolved
vector/src/main/java/org/apache/arrow/vector/util/TransferPairWithExtendedType.java
Outdated
Show resolved
Hide resolved
| this(new LargeListVector(field, allocator, callBack)); | ||
| } | ||
|
|
||
| public TransferImpl(LargeListVector to, ExtensionTypeWriterFactory writerFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(later?) since TransferImpl is a private class, it should be okay to clean some constructor and replace them with factory methods so that fields could be actually private final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution would break some cases in Dremio. With this commit design we would need to check if it is necessary a factory or not before calling the copyfrom, because we would hit a UnsupportedException for BaseValueVectors. Ideally this extension factory shouldn't exist at this level. We have Timestamp, Uuid and Variant in Dremio that are new Extension types and the implementations should "live" in arrow instead and we need to pass the DremioExtensionFactory while doing this transfer with the current design. I am also wondering if this factory responsability should be from the extension type itself and not for all types. I am still thinking about a better solution but I am also a bit reluctant to change the current design/API
| Object value = reader.readObject(); | ||
| if (value != null) { | ||
| writer.addExtensionTypeWriterFactory(extensionTypeWriterFactory); | ||
| ExtensionTypeWriterFactory writerFactory = reader.getExtensionTypeWriterFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change: now the responsibility comes from the reader instead of passing down the factory.
What's Changed
Previously, extension type support was added to the
ValueVectorinterface throughcopyFrom()andcopyFromSafe()methods that accepted anExtensionTypeWriterFactoryparameter. This approach had several issues:ValueVectorinterface that were only relevant for complex types (List, LargeList, Struct, etc.)Solution
This PR reverts the extension type support from the vector copy operations by:
copyFrom()andcopyFromSafe()methods withExtensionTypeWriterFactoryparameter from theValueVectorinterfaceComplexCopierto remove the extension type writer factory parametergetExtensionTypeWriterFactorytoExtensionReader, now responsible for providing the writer factory for Complex types operationsKey Changes
Core API Changes
ValueVectorinterface:copyFrom(int, int, ValueVector, ExtensionTypeWriterFactory)copyFromSafe(int, int, ValueVector, ExtensionTypeWriterFactory)getExtensionTypeWriterFactorytoExtensionReaderBenefits
ValueVectorinterfaceBackward Compatibility
This is a breaking change that removes public API methods from
ValueVector. Users who were callingcopyFrom()orcopyFromSafe()with anExtensionTypeWriterFactoryparameter will need to update their code.Migration:
The extension type factory parameter has been removed. Extension types are now copied through the standard reader/writer mechanism in
ComplexCopier, which automatically handles extension types when the writer has the appropriate factory registered.Closes #891 .