Skip to content

Prototype implementing DataFusion functions / operators using arrow-udf liibrary #11413

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Related to the discussion on #11192 with @Xuanwo

RisingWave has a library for automatically creating vectorized implementations of functions (e.g. that operate on arrow arrays) from scalar implementations

The library is here: https://github.com/risingwavelabs/arrow-udf

A blog post describing it is here: https://risingwave.com/blog/simplifying-sql-function-implementation-with-rust-procedural-macro/

DataFusion uses macros to do something similar in binary.rs but they are pretty hard to read / understand in my opinon:

macro_rules! compute_utf8_op {
($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
let ll = $LEFT
.as_any()
.downcast_ref::<$DT>()
.expect("compute_op failed to downcast left side array");
let rr = $RIGHT
.as_any()
.downcast_ref::<$DT>()
.expect("compute_op failed to downcast right side array");
Ok(Arc::new(paste::expr! {[<$OP _utf8>]}(&ll, &rr)?))
}};
}

One main benefit I can see to switching to https://github.com/risingwavelabs/arrow-udf is that we could then extend arrow-udf to support Dictionary and StringView and maybe other types to generate fast kernels for multiple different array layouts.

Describe the solution you'd like

I think it would be great if someone could evaluate the feasibility of using the macros in https://github.com/risingwavelabs/arrow-udf to implement Datafusion's operations (and maybe eventually functions etc)

Describe alternatives you've considered

I suggest a POC that picks one or two functions (maybe string equality or regexp_match or something) and tries to use arrow-udfs function macro instead.

Here is an example of how to use it: https://docs.rs/arrow-udf/0.3.0/arrow_udf/

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions