-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The Repartition optimizer recurses into plans and looks for operators with an output partition count less than the target partition count, and on finding such a partition adds a RepartitionExec node.
Unfortunately this does not take into account how this might impact the concurrency of the plan as a whole.
For example, in IOx it is fairly common to have a UnionExec with a list of many IOxReadFilterNode as children. Each of these IOxReadFilterNode is a single partition, and therefore the Repartition optimizer repartitions the output of each of these IOxReadFilterNode.
This leads to a partition explosion, with the UnionExec going from potentially having x partitions, to n * x where n is the target parallelism. In some of our plans this is leading to thousands of partitions, which is not ideal 😁
A similar quirk can occur when there is an operator further up the chain that imposes some partitioning constraint, e.g. a sort, but isn't the direct parent. I'm fairly certain that I've seen plans generated that RepartitionExec into a LimitExec into a CoalescePartitionsExec.
Describe the solution you'd like
I don't really know how to solve this, but I wanted to flag up that the current greedy approach can be pretty sub-optimal, and that perhaps a more holistic optimizer that can take into account the plan as a whole might be beneficial