Skip to content

Support extended partition cols for listing table. #18482

@animodak7

Description

@animodak7

Is your feature request related to a problem or challenge?

When scanning partitioned files, there are scenarios where runtime-generated values (not persisted in the files) need to be attached to each RecordBatch.

Currently, when a partition contains multiple files, loadNextBatch has no context about which file it is returning rows from.
This makes it impossible to append per-file runtime data to the resulting RecordBatch.
We’d like a way to extend the file schema and stream with additional columns—similar to how table_partition_cols are added from directory structure.

Example

Partition directory: /data1/
Files:
  /data1/file1
  /data1/file2
  /data1/file3

File schema: { row_id: Int32, b: Int32 }

Runtime metadata:
  file1 -> cumulative_total_rows = 5
  file2 -> cumulative_total_rows = 7
  file3 -> cumulative_total_rows = 17

Derived schema:
  { row_id: Int32, b: Int32, cumulative_total_rows }

Example expression:
  row_id + cumulative_total_rows

Describe the solution you'd like

Extend the ListingTable and ListingOptions to support user-provided extended columns (extended_cols), which are appended to each file’s stream and schema—analogous to table_partition_cols.

  • Add extended_cols to ListingOptions, defined as:
extended_cols: HashMap<String, HashMap<String, ScalarValue>>

where:
- outer key = column name
- inner key = file name
- value = runtime constant for that file

  • These values should be made available in the scan output (similar to partition columns), allowing expressions to reference them.

Describe alternatives you've considered

  • Expose ObjectMeta to PhysicalExprAdapter, allowing it to append file metadata (e.g., file name) to the stream.
  • Then a MemTable with file_name → extended_col mappings could be joined to enrich data.

Any alternative mechanism that makes per-file runtime context accessible during scan would work.

@alamb @timsaucer Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions