-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
Related to #12709
Currently, in BinaryTypeCoercer, we have the following logic for the StringConcat operator:
StringConcat => {
string_concat_coercion(self.lhs, self.rhs).map(Signature::uniform).ok_or_else(|| {
plan_datafusion_err!(
"Cannot infer common string type for string concat operation {} {} {}", self.lhs, self.op, self.rhs
)
})
}The internal string_concat_coercion function requires at least one side to be a string-like type (Utf8, LargeUtf8, Utf8View) and the other side to be castable to string:
/// Coercion rules for string concat.
/// This is a union of string coercion rules and specified rules:
/// 1. At least one side of lhs and rhs should be string type (Utf8 / LargeUtf8)
/// 2. Data type of the other side should be able to cast to string type
fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
string_coercion(lhs_type, rhs_type).or(match (lhs_type, rhs_type) {
(Utf8View, from_type) | (from_type, Utf8View) => {
string_concat_internal_coercion(from_type, &Utf8View)
}
(Utf8, from_type) | (from_type, Utf8) => {
string_concat_internal_coercion(from_type, &Utf8)
}
(LargeUtf8, from_type) | (from_type, LargeUtf8) => {
string_concat_internal_coercion(from_type, &LargeUtf8)
}
(Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
string_coercion(lhs_value_type, rhs_value_type).or(None)
}
_ => None,
})
}However, this implementation fails if both operands are non-string types that can still be casted to string, for example:
select 1||2;
+----------------------+
| Int64(1) || Int64(2) |
|----------------------|
| 12 |
+----------------------+This use case could be safely supported by leveraging existing TypeCoercionRewriter, which is already responsible for inserting casts during logical plan rewriting:
Expr::BinaryExpr(BinaryExpr { left, op, right }) => {
let (left, right) =
self.coerce_binary_op(*left, self.schema, op, *right, self.schema)?;
Ok(Transformed::yes(Expr::BinaryExpr(BinaryExpr::new(
Box::new(left),
op,
Box::new(right),
))))
}Describe the solution you'd like
Proposal
We should update string_concat_coercion to also accept types where both sides are castable to string, not just one side being a string already.
This would allow expressions like 1 || 2 to succeed by coercing both operands to strings prior to concatenation.
This change aligns with the behavior in SQL engines like PostgreSQL and makes coercion rules for || more intuitive and consistent with general type coercion strategy.
fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
string_coercion(lhs_type, rhs_type).or(match (lhs_type, rhs_type) {
(Utf8View, from_type) | (from_type, Utf8View) => {
string_concat_internal_coercion(from_type, &Utf8View)
}
(Utf8, from_type) | (from_type, Utf8) => {
string_concat_internal_coercion(from_type, &Utf8)
}
(LargeUtf8, from_type) | (from_type, LargeUtf8) => {
string_concat_internal_coercion(from_type, &LargeUtf8)
}
(Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
string_coercion(lhs_value_type, rhs_value_type).or(None)
}
_ =>
if can_cast_types(lhs_type, &Utf8) && can_cast_types(rhs_type, &Utf8) {
Some(Utf8)
} else if can_cast_types(lhs_type, &LargeUtf8) && can_cast_types(rhs_type, &LargeUtf8) {
Some(LargeUtf8)
} else {
None
}
})
}Describe alternatives you've considered
No response
Additional context
No response