-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
Related to the discussion on #11192 with @Xuanwo
RisingWave has a library for automatically creating vectorized implementations of functions (e.g. that operate on arrow arrays) from scalar implementations
The library is here: https://github.com/risingwavelabs/arrow-udf
A blog post describing it is here: https://risingwave.com/blog/simplifying-sql-function-implementation-with-rust-procedural-macro/
DataFusion uses macros to do something similar in binary.rs but they are pretty hard to read / understand in my opinon:
datafusion/datafusion/physical-expr/src/expressions/binary.rs
Lines 118 to 130 in 7a23ea9
| macro_rules! compute_utf8_op { | |
| ($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{ | |
| let ll = $LEFT | |
| .as_any() | |
| .downcast_ref::<$DT>() | |
| .expect("compute_op failed to downcast left side array"); | |
| let rr = $RIGHT | |
| .as_any() | |
| .downcast_ref::<$DT>() | |
| .expect("compute_op failed to downcast right side array"); | |
| Ok(Arc::new(paste::expr! {[<$OP _utf8>]}(&ll, &rr)?)) | |
| }}; | |
| } |
One main benefit I can see to switching to https://github.com/risingwavelabs/arrow-udf is that we could then extend arrow-udf to support Dictionary and StringView and maybe other types to generate fast kernels for multiple different array layouts.
Describe the solution you'd like
I think it would be great if someone could evaluate the feasibility of using the macros in https://github.com/risingwavelabs/arrow-udf to implement Datafusion's operations (and maybe eventually functions etc)
Describe alternatives you've considered
I suggest a POC that picks one or two functions (maybe string equality or regexp_match or something) and tries to use arrow-udfs function macro instead.
Here is an example of how to use it: https://docs.rs/arrow-udf/0.3.0/arrow_udf/
Additional context
No response