Skip to content

Implement trait based API for defining ScalarUDFs #8568

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

The current way a user implements a ScalarUDF is awkward:

They must wade through several Arc<dyn<...> typedefs to figure out how to provide the type signature and implementation

pub fn new(
    name: &str,
    signature: &Signature,
    return_type: &Arc<dyn Fn(&[DataType]) -> Result<Arc<DataType>, DataFusionError> + Send + Sync>,
    fun: &Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> + Send + Sync>
) -> ScalarUDF

The create_udf is somewhat easier to use, but it still requires Arc's of anonymous functions

Describe the solution you'd like

I am not sure why this API is implemented like it is. If I were a user I would expect to be able to use a trait object

like

struct MyUDF { 
..
}

impl FunctionDefintion for MyUDF {
  fn name(&self) -> &str, 
  fn return_type(&self) -> &DataType, 
...
}

Describe alternatives you've considered

No response

Additional context

I want to make ScalarUDFs easy to define for two reasons:

  1. they are a key API for DataFusion
  2. As we work on making all scalar functions ScalarUDF [EPIC] Unify Function Interface (remove BuiltInScalarFunction) #8045 the easer it is to make UDFs the easier it is to complete that task

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions