-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
The schema inference logic in parquet does not infer the correct nullability for nested types.
For example
let message_type = "
message test_schema {
OPTIONAL INT32 leaf1;
REPEATED GROUP outerGroup {
OPTIONAL INT32 leaf2;
REPEATED GROUP innerGroup {
OPTIONAL INT32 leaf3;
}
}
}
";
let parquet_group_type = parse_message_type(message_type).unwrap();
let parquet_schema = SchemaDescriptor::new(Arc::new(parquet_group_type));
let converted_arrow_schema =
parquet_to_arrow_schema(&parquet_schema, None).unwrap();
Will infer innerGroup and outerGroup as nullable lists with nullable elements, when they are neither.
To Reproduce
See test
Expected behavior
The nullability should be inferred correctly
Additional context
This has likely been hidden by the lack of support for repeated fields - #1680