Skip to content

Conversation

EeshanBembi
Copy link

Summary

This PR fixes a panic in UnionExec when constructed with empty inputs, replacing the crash with proper error handling and descriptive error messages.

Fixes: #17052

Problem

When UnionExec::new(vec![]) was called with an empty input vector, it would panic with:

thread '...' panicked at datafusion/physical-plan/src/union.rs:542:24:
index out of bounds: the len is 0 but the index is 0

This occurred because union_schema() directly accessed inputs[0] without checking if the array was empty.

Solution

Core Changes

  1. Made UnionExec::new() return Result<Self>:

    • Added validation: returns error if inputs.is_empty()
    • Provides clear error message: "UnionExec requires at least one input"
  2. Made union_schema() return Result<SchemaRef>:

    • Added empty input validation before accessing inputs[0]
    • Returns descriptive error: "Cannot create union schema from empty inputs"
  3. Updated all call sites (7 files):

    • physical_planner.rs - Core DataFusion integration
    • repartition/mod.rs - Internal dependencies
    • 4 test files - Updated to handle Result return type

Error Handling

  • Before: Index out of bounds panic (unhelpful)
  • After: Clear error messages that guide users
// Before: panic!
let union = UnionExec::new(vec![]); // PANIC!

// After: proper error handling
match UnionExec::new(vec![]) {
    Ok(_) => { /* use union */ }
    Err(e) => println!("Error: {}", e); // "UnionExec requires at least one input"
}

Testing

Added 4 comprehensive tests:

  1. test_union_empty_inputs() - Verifies empty input validation
  2. test_union_schema_empty_inputs() - Tests schema creation with empty inputs
  3. test_union_single_input() - Ensures single input still works
  4. test_union_multiple_inputs_still_works() - Verifies existing functionality unchanged

Test Results:

  • ✅ All new tests pass
  • ✅ All existing union tests pass (8/8)
  • ✅ All physical planner integration tests pass

Backward Compatibility

  • Existing functionality unchanged for valid inputs (≥1 input)
  • Only adds error handling for previously crashing invalid inputs
  • API change: UnionExec::new() now returns Result<Self> instead of Self

This is a breaking change but justified because:

  1. The previous behavior (panic) was incorrect
  2. Empty inputs are invalid by design (no logical meaning)
  3. Consistent with logical Union which requires ≥2 inputs
  4. Better error handling improves user experience

Files Changed

  • datafusion/physical-plan/src/union.rs - Core fix + tests (main changes)
  • datafusion/core/src/physical_planner.rs - Handle Result return
  • datafusion/physical-plan/src/repartition/mod.rs - Update internal calls
  • 4 test files - Update test utilities and test cases

The fix provides robust error handling while maintaining all existing functionality for valid use cases.

This commit fixes a panic in UnionExec when constructed with empty inputs.
Previously, UnionExec::new(vec![]) would cause an index out of bounds panic
at union.rs:542 when trying to access inputs[0].

Changes:
- Made UnionExec::new() return Result<Self> with proper validation
- Made union_schema() return Result<SchemaRef> with empty input checks
- Added descriptive error messages for empty input cases
- Updated all call sites to handle the new Result return type
- Added comprehensive tests for edge cases

Error messages:
- "UnionExec requires at least one input"
- "Cannot create union schema from empty inputs"

The fix maintains backward compatibility for valid inputs while preventing
crashes and providing clear error messages for invalid usage.

Fixes apache#17052
@github-actions github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Sep 5, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EeshanBembi

fn test_union_empty_inputs() {
// Test that UnionExec::new fails with empty inputs
let result = UnionExec::new(vec![]);
assert!(result.is_err());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assertion check for is_err is redundant as unwrap_err will panic if result is not an err

@@ -101,19 +101,23 @@ pub struct UnionExec {

impl UnionExec {
/// Create a new UnionExec
pub fn new(inputs: Vec<Arc<dyn ExecutionPlan>>) -> Self {
let schema = union_schema(&inputs);
pub fn new(inputs: Vec<Arc<dyn ExecutionPlan>>) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically an API change -- maybe to make it easier on others, we can make a new function called try_new that has the error checking, and deprecate the existing new function per https://datafusion.apache.org/contributor-guide/api-health.html#deprecation-guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants