Skip to content

Conversation

@jhrotko
Copy link

@jhrotko jhrotko commented Oct 22, 2025

What's Changed

Previously, extension type support was added to the ValueVector interface through copyFrom() and copyFromSafe() methods that accepted an ExtensionTypeWriterFactory parameter. This approach had several issues:

  1. API Pollution: Added methods to the base ValueVector interface that were only relevant for complex types (List, LargeList, Struct, etc.)
  2. Inconsistent Usage: Most vector implementations didn't need or use the extension type writer factory
  3. Tight Coupling: Tightly coupled the extension type handling to the copy operation rather than keeping it separate

Solution

This PR reverts the extension type support from the vector copy operations by:

  1. Removing the copyFrom() and copyFromSafe() methods with ExtensionTypeWriterFactory parameter from the ValueVector interface
  2. Simplifying ComplexCopier to remove the extension type writer factory parameter
  3. Added getExtensionTypeWriterFactory to ExtensionReader, now responsible for providing the writer factory for Complex types operations

Key Changes

Core API Changes

  • Removed from ValueVector interface:
    • copyFrom(int, int, ValueVector, ExtensionTypeWriterFactory)
    • copyFromSafe(int, int, ValueVector, ExtensionTypeWriterFactory)
  • Added getExtensionTypeWriterFactory to ExtensionReader

Benefits

  1. Cleaner API: Removes specialized extension type methods from the base ValueVector interface
  2. Simpler Design: Extension types are handled through the standard reader/writer mechanism
  3. Reduced Complexity: Fewer methods and interfaces to maintain
  4. Better Separation of Concerns: Extension type handling remains in the reader/writer layer where it belongs

Backward Compatibility

This is a breaking change that removes public API methods from ValueVector. Users who were calling copyFrom() or copyFromSafe() with an ExtensionTypeWriterFactory parameter will need to update their code.

Migration:

The extension type factory parameter has been removed. Extension types are now copied through the standard reader/writer mechanism in ComplexCopier, which automatically handles extension types when the writer has the appropriate factory registered.

// Old approach (removed)
outVector.copyFromSafe(0, 0, inVector, new UuidWriterFactory());

// New approach - use standard copy
outVector.copyFromSafe(0, 0, inVector);
// Extension types are handled automatically through the reader/writer mechanism

Closes #891 .

@github-actions

This comment has been minimized.

@jhrotko jhrotko force-pushed the GH-891 branch 2 times, most recently from 67334a6 to 7eba2c1 Compare October 22, 2025 21:09
@jhrotko jhrotko marked this pull request as ready for review October 22, 2025 21:13
@jhrotko
Copy link
Author

jhrotko commented Oct 23, 2025

Hello, @lidavidm! Could you take a look at this PR? Also, I don't have permissions to change the label

@lidavidm lidavidm added the enhancement PRs that add or improve features. label Oct 23, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone Oct 23, 2025
@jbonofre
Copy link
Member

@jhrotko I will take a look on this one as soon as the CI is green (it should be good very soon).

this(new LargeListVector(field, allocator, callBack));
}

public TransferImpl(LargeListVector to, ExtensionTypeWriterFactory writerFactory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(later?) since TransferImpl is a private class, it should be okay to clean some constructor and replace them with factory methods so that fields could be actually private final

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution would break some cases in Dremio. With this commit design we would need to check if it is necessary a factory or not before calling the copyfrom, because we would hit a UnsupportedException for BaseValueVectors. Ideally this extension factory shouldn't exist at this level. We have Timestamp, Uuid and Variant in Dremio that are new Extension types and the implementations should "live" in arrow instead and we need to pass the DremioExtensionFactory while doing this transfer with the current design. I am also wondering if this factory responsability should be from the extension type itself and not for all types. I am still thinking about a better solution but I am also a bit reluctant to change the current design/API

@jhrotko jhrotko requested a review from laurentgo October 30, 2025 09:39
Object value = reader.readObject();
if (value != null) {
writer.addExtensionTypeWriterFactory(extensionTypeWriterFactory);
ExtensionTypeWriterFactory writerFactory = reader.getExtensionTypeWriterFactory();
Copy link
Author

@jhrotko jhrotko Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key change: now the responsibility comes from the reader instead of passing down the factory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change enhancement PRs that add or improve features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ExtensionTypeWriterFactory to TransferPair

4 participants