Is your feature request related to a problem or challenge?
Reported by @crepererum
The state is initialized ONCE for all partitions. However this may take a short while (on a very busy system 1ms or more). It is quite likely that multiple threads call execute at the same time, because we have just fanned out to the number "target partitions" which is likely set to the number of CPU cores which now all try to start to execute the plan at the same time.
Describe the solution you'd like
Minimize the lock contention
Describe alternatives you've considered
No response
Additional context
No response