Skip to content

regression: inlist deserialization error #17225

@haohuaijin

Description

@haohuaijin

Describe the bug

encounter deserialization error when query have inlist and other filter like below

Error: Internal("PhysicalExpr Column references column 'p_size' at index 1 (zero-based) but input schema only has 1 columns: [\"p_size\"]")

the query is

SELECT p_size FROM part WHERE p_size IN (14, 6, 5, 31) and p_partkey > 1000

To Reproduce

add a reproduce in pr https://github.com/apache/datafusion/pull/17224/files

this is another reproduce https://github.com/haohuaijin/inlist-reproduce
The code is as follows

use std::sync::Arc;

use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use datafusion::{
    datasource::{
        file_format::parquet::ParquetFormat,
        listing::{ListingOptions, ListingTableUrl},
    },
    prelude::SessionContext,
};
use datafusion_proto::{
    physical_plan::{AsExecutionPlan, DefaultPhysicalExtensionCodec},
    protobuf::PhysicalPlanNode,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ctx = SessionContext::new();
    let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()));
    let table_path = ListingTableUrl::parse("data.parquet")?;

    ctx.register_listing_table(
        "default",
        &table_path,
        listing_options.clone(),
        Some(get_schema()),
        None,
    )
    .await?;

    let plan = ctx
        .sql("select message from default where message in ('a', 'b', 'c', 'd') and timestamp >= 1")
        .await
        .unwrap()
        .create_physical_plan()
        .await
        .unwrap();

    let node: PhysicalPlanNode =
        PhysicalPlanNode::try_from_physical_plan(plan, &DefaultPhysicalExtensionCodec {}).unwrap();

    let plan = node
        .try_into_physical_plan(&ctx, &ctx.runtime_env(), &DefaultPhysicalExtensionCodec {})
        .unwrap();

    println!("{:?}", plan);

    Ok(())
}

fn get_schema() -> SchemaRef {
    SchemaRef::new(Schema::new(vec![
        Field::new("timestamp", DataType::Int64, false),
        Field::new("message", DataType::Utf8, true),
    ]))
}

Expected behavior

deserialization success

Additional context

it work fine in datafusion v47 and v48
look like related #16665 and #16744

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer does

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions