Skip to content

Exponential planning time when window function is partitioned by multiple columns #17401

@findepi

Description

@findepi

Describe the bug

Planning time, or to be precise: optimization time, is very long for queries that have a window function partitioned by multiple columns.

To Reproduce

WITH source AS (
    SELECT
        1 AS n,
        '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8,
        '' AS a9, '' AS a10, '' AS a11, '' AS a12
)
SELECT
    sum(n) OVER (PARTITION BY
        a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12
    )
FROM source;

In 48.0.1 this takes ~543 seconds.

On current main this takes ~292 seconds
It looks O(4ⁿ)1 where n is number of columns used in PARTITION BY

Expected behavior

In DataFusion 45.0.0 this query runs in <100ms.

Additional context

No response

Footnotes

  1. based on timings when adding or removing a column. Definitely exponential, not sure the base.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingoptimizerOptimizer rules

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions