Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 4, 2022

Which issue does this PR close?

Along with #1732, Fixes #423 (the last part).

re #424

Rationale for this change

Repartitioning the input to an operator that relies on its input to be sorted is incorrect as the repartitioning will intermix the partitions and effectively "unsort" the input stream

We found this in IOx here https://github.com/influxdata/influxdb_iox/pull/3633#issuecomment-1030126757

Here is a picture showing the problem:

    ┌─────────────────────────────────┐
    │                                 │
    │     SortPreservingMergeExec     │
    │                                 │
    └─────────────────────────────────┘
              ▲      ▲       ▲            Input may not
              │      │       │             be sorted!
      ┌───────┘      │       └───────┐
      │              │               │
      │              │               │
┌───────────┐  ┌───────────┐   ┌───────────┐
│           │  │           │   │           │
│ batch A1  │  │ batch A3  │   │ batch B3  │
│           │  │           │   │           │
├───────────┤  ├───────────┤   ├───────────┤
│           │  │           │   │           │
│ batch B2  │  │ batch B1  │   │ batch A2  │
│           │  │           │   │           │
└───────────┘  └───────────┘   └───────────┘
      ▲              ▲               ▲
      │              │               │
      └─────────┐    │    ┌──────────┘
                │    │    │                  Outputs
                │    │    │                batches are
    ┌─────────────────────────────────┐   repartitioned
    │       RepartitionExec(3)        │    and may not
    │           RoundRobin            │   remain sorted
    │                                 │
    └─────────────────────────────────┘
                ▲         ▲
                │         │                Inputs are
          ┌─────┘         └─────┐            sorted
          │                     │
          │                     │
          │                     │
    ┌───────────┐         ┌───────────┐
    │           │         │           │
    │ batch A1  │         │ batch B1  │
    │           │         │           │
    ├───────────┤         ├───────────┤
    │           │         │           │
    │ batch A2  │         │ batch B2  │
    │           │         │           │
    ├───────────┤         ├───────────┤
    │           │         │           │
    │ batch A3  │         │ batch B3  │
    │           │         │           │
    └───────────┘         └───────────┘


     Sorted Input          Sorted Input
           A                     B

The streams need to remain the way they were

┌─────────────────────────────────┐
│                                 │
│     SortPreservingMergeExec     │
│                                 │
└─────────────────────────────────┘
            ▲         ▲
            │         │         Inputs are
      ┌─────┘         └─────┐   sorted, as
      │                     │    required
      │                     │
      │                     │
┌───────────┐         ┌───────────┐
│           │         │           │
│ batch A1  │         │ batch B1  │
│           │         │           │
├───────────┤         ├───────────┤
│           │         │           │
│ batch A2  │         │ batch B2  │
│           │         │           │
├───────────┤         ├───────────┤
│           │         │           │
│ batch A3  │         │ batch B3  │
│           │         │           │
└───────────┘         └───────────┘


 Sorted Input          Sorted Input
       A                     B

What changes are included in this PR?

  1. use flag introduced in Prevent repartitioning of certain operator's direct children (#1731) #1732 to not repartition the input to SortPreservingMerge operator
  2. Test to ensure it is not repartitioned

Are there any user-facing changes?

Probably not (yet) as SortPreservingMerge isn't used as part of SQL planning

cc @tustvold @Dandandan

fn should_repartition_children(&self) -> bool {
// if the children are repartitioned they may no longer remain
// sorted
false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the fix, the rest is tests

Copy link
Contributor

@tustvold tustvold Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this comment elsewhere, but going to make it here for visibility - this will only prevent repartitioning of direct children.

So for example if you had

Sort Preserving Merge <- Projection Exec <- Sorted Scan

I think Repartition could convert this to

Sort Preserving Merge <- Projection Exec <- Repartition <- Sorted Scan

Which would break sorting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will write some more tests and fix this

@alamb alamb marked this pull request as draft February 4, 2022 19:45
@alamb alamb force-pushed the alamb/less_repartitioing branch 2 times, most recently from 463c048 to 2858e34 Compare February 7, 2022 20:50
@alamb
Copy link
Contributor Author

alamb commented Feb 7, 2022

My PR got a bit more ambitious, so I will open a new one with a better description

@alamb alamb closed this Feb 7, 2022
@alamb
Copy link
Contributor Author

alamb commented Feb 8, 2022

Follow on PR is #1776

@alamb alamb deleted the alamb/less_repartitioing branch April 12, 2022 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repartition Optimisation Pass Breaks Sorting (wrong answer with limit)

2 participants