Skip to content

Conversation

@eejbyfeldt
Copy link
Contributor

@eejbyfeldt eejbyfeldt commented Sep 1, 2024

Which issue does this PR close?

Closes #5436.

Rationale for this change

Improves performance by avoiding copying some of the columns during execution of FilterExec.

Comparision tpch_mem10

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ filter-projection ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 2177.83ms │         2127.02ms │     no change │
│ QQuery 2     │  482.20ms │          471.44ms │     no change │
│ QQuery 3     │ 1118.91ms │         1093.83ms │     no change │
│ QQuery 4     │  799.31ms │          628.54ms │ +1.27x faster │
│ QQuery 5     │ 1841.60ms │         1870.31ms │     no change │
│ QQuery 6     │  118.43ms │          117.54ms │     no change │
│ QQuery 7     │ 3323.14ms │         3308.58ms │     no change │
│ QQuery 8     │ 1451.20ms │         1447.37ms │     no change │
│ QQuery 9     │ 2983.42ms │         2981.28ms │     no change │
│ QQuery 10    │ 1434.72ms │         1409.87ms │     no change │
│ QQuery 11    │  950.67ms │          956.36ms │     no change │
│ QQuery 12    │  339.26ms │          322.72ms │     no change │
│ QQuery 13    │  947.67ms │          761.68ms │ +1.24x faster │
│ QQuery 14    │  163.97ms │          171.94ms │     no change │
│ QQuery 15    │  384.72ms │          402.21ms │     no change │
│ QQuery 16    │  371.63ms │          373.32ms │     no change │
│ QQuery 17    │ 3024.34ms │         3098.19ms │     no change │
│ QQuery 18    │ 7221.31ms │         7250.94ms │     no change │
│ QQuery 19    │  266.89ms │          262.75ms │     no change │
│ QQuery 20    │  815.55ms │          824.50ms │     no change │
│ QQuery 21    │ 4398.76ms │         4243.96ms │     no change │
│ QQuery 22    │  182.82ms │          180.98ms │     no change │
└──────────────┴───────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                │ 34798.35ms │
│ Total Time (filter-projection)   │ 34305.29ms │
│ Average Time (main)              │  1581.74ms │
│ Average Time (filter-projection) │  1559.33ms │
│ Queries Faster                   │          2 │
│ Queries Slower                   │          0 │
│ Queries with No Change           │         20 │
└──────────────────────────────────┴────────────┘

What changes are included in this PR?

Adds a projection to FilterExec similarly to #9236 for HashJoinExec. It also resuses a lot of the logic from that PR to implement the pushdown.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

Yes, projections can now be pushed into the FilterExec.

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Sep 1, 2024
@eejbyfeldt eejbyfeldt force-pushed the implement-filter-exec-projection branch 2 times, most recently from cc744be to 9456c00 Compare September 2, 2024 07:35
@eejbyfeldt eejbyfeldt force-pushed the implement-filter-exec-projection branch from b0aacd6 to 80b2ab3 Compare September 2, 2024 08:18
@github-actions github-actions bot added the proto Related to proto crate label Sep 2, 2024
@eejbyfeldt eejbyfeldt marked this pull request as ready for review September 2, 2024 12:45
@eejbyfeldt eejbyfeldt requested a review from Dandandan September 2, 2024 12:45
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice 🚀

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @eejbyfeldt. It seems like the with_projection() API could end up being a method of the ExecutionPlan in the long run, wdyt? (It would improve the performance of all operators which internally refer to some columns but its downstream operators do not actually require that column)

Ok(Some(Arc::new(UnionExec::new(new_children))))
}

trait EmbededProjection: ExecutionPlan + Sized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmbededProjection -> EmbeddedProjection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0c05fb9

.map(|map| Self { map })
}

pub fn from_indices(indices: &[usize], schema: &SchemaRef) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief docstring would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0c05fb9

}
}

pub fn project(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write a docstring here also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0c05fb9

@eejbyfeldt eejbyfeldt force-pushed the implement-filter-exec-projection branch from 80a879a to 0c05fb9 Compare September 6, 2024 12:20
@eejbyfeldt
Copy link
Contributor Author

It seems like the with_projection() API could end up being a method of the ExecutionPlan in the long run, wdyt? (It would improve the performance of all operators which internally refer to some columns but its downstream operators do not actually require that column)

Probably worth considering in the future. The argument against making it part of ExecutionPlan is that there are some operators like CoalescePartitionsExec where it will not make sense.

Addressed the comments.

08)--------------AggregateExec: mode=Partial, gby=[v1@0 as v1, v2@1 as v2], aggr=[max(having_test.v1)]
09)----------------MemoryExec: partitions=1, partition_sizes=[1]
01)CoalesceBatchesExec: target_batch_size=8192
02)--FilterExec: max(having_test.v1)@2 = 3, projection=[v1@0, v2@1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very cool

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice improvement -- thank you @eejbyfeldt .

Thank you @Dandandan and @berkaysynnada for the reviews

I went throught he plan changes carefully and they are 👌 👨‍🍳

22)------------------------------------------HashJoinExec: mode=Partitioned, join_type=Inner, on=[(p_partkey@0, ps_partkey@0)], projection=[p_partkey@0, p_mfgr@1, ps_suppkey@3, ps_supplycost@4]
23)--------------------------------------------CoalesceBatchesExec: target_batch_size=8192
24)----------------------------------------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), input_partitions=4
25)------------------------------------------------ProjectionExec: expr=[p_partkey@0 as p_partkey, p_mfgr@1 as p_mfgr]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these plans are doubly good -- they save two copies -- one in FilterExec and one in CoalesceBatchesExec

@alamb
Copy link
Contributor

alamb commented Sep 6, 2024

🚀

@alamb alamb merged commit 4659096 into apache:main Sep 6, 2024
@alamb
Copy link
Contributor

alamb commented Sep 6, 2024

FWIW I ran clickbench and also saw improvement on Q30 and Q31

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ implement-filter-exec-projection ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.66ms │                           0.67ms │     no change │
│ QQuery 1     │    70.03ms │                          73.91ms │  1.06x slower │
│ QQuery 2     │   126.45ms │                         124.52ms │     no change │
│ QQuery 3     │   129.91ms │                         137.13ms │  1.06x slower │
│ QQuery 4     │   960.79ms │                         939.71ms │     no change │
│ QQuery 5     │  1036.28ms │                        1022.72ms │     no change │
│ QQuery 6     │    66.12ms │                          64.91ms │     no change │
│ QQuery 7     │    73.34ms │                          73.79ms │     no change │
│ QQuery 8     │  1344.30ms │                        1334.81ms │     no change │
│ QQuery 9     │  1306.69ms │                        1309.81ms │     no change │
│ QQuery 10    │   446.44ms │                         442.18ms │     no change │
│ QQuery 11    │   478.30ms │                         481.05ms │     no change │
│ QQuery 12    │  1113.12ms │                        1104.71ms │     no change │
│ QQuery 13    │  2101.60ms │                        2051.31ms │     no change │
│ QQuery 14    │  1506.83ms │                        1503.34ms │     no change │
│ QQuery 15    │  1064.88ms │                        1060.89ms │     no change │
│ QQuery 16    │  2759.74ms │                        2710.97ms │     no change │
│ QQuery 17    │  2701.32ms │                        2669.99ms │     no change │
│ QQuery 18    │  5503.13ms │                        5371.89ms │     no change │
│ QQuery 19    │   117.10ms │                         119.67ms │     no change │
│ QQuery 20    │  1607.88ms │                        1640.09ms │     no change │
│ QQuery 21    │  1952.27ms │                        1996.70ms │     no change │
│ QQuery 22    │  4382.22ms │                        4405.65ms │     no change │
│ QQuery 23    │ 10759.26ms │                       10851.10ms │     no change │
│ QQuery 24    │   714.73ms │                         718.81ms │     no change │
│ QQuery 25    │   627.59ms │                         634.67ms │     no change │
│ QQuery 26    │   784.53ms │                         784.22ms │     no change │
│ QQuery 27    │  2465.46ms │                        2514.45ms │     no change │
│ QQuery 28    │ 15630.39ms │                       15474.96ms │     no change │
│ QQuery 29    │   572.17ms │                         551.60ms │     no change │
│ QQuery 30    │  1229.86ms │                        1167.28ms │ +1.05x faster │
│ QQuery 31    │  1277.91ms │                        1212.40ms │ +1.05x faster │
│ QQuery 32    │  4446.44ms │                        4522.52ms │     no change │
│ QQuery 33    │  5049.17ms │                        5044.77ms │     no change │
│ QQuery 34    │  4978.22ms │                        4890.22ms │     no change │
│ QQuery 35    │  1759.29ms │                        1734.22ms │     no change │
│ QQuery 36    │   319.80ms │                         312.65ms │     no change │
│ QQuery 37    │   212.58ms │                         204.92ms │     no change │
│ QQuery 38    │   189.61ms │                         189.14ms │     no change │
│ QQuery 39    │   966.16ms │                         982.35ms │     no change │
│ QQuery 40    │    89.98ms │                          86.07ms │     no change │
│ QQuery 41    │    79.43ms │                          80.69ms │     no change │
│ QQuery 42    │    97.07ms │                          95.90ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘

Which correspond to several queries where the filter is on a string column ("SearchPhrase" that is not used after the filter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add projection to FilterExec to avoid unecessary output creation

4 participants