Skip to content

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Jan 20, 2023

Convert the persist_source decode_and_mfp operator to a plain Timely operator without using async. This avoids maintaining Timely state across yields and seems simpler to reason about in general.

This also takes the intended changes from #17156 to compact the output in-place using the ConsolidateBuffer. This way, we don't have to worry about large allocations or spending significant time in consolidating updates.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a
    companion cloud PR to account for those changes that is tagged with
    the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@antiguru antiguru force-pushed the persist_source_no_async branch 2 times, most recently from 3f4ef85 to 6445833 Compare January 27, 2023 16:07
@antiguru antiguru marked this pull request as ready for review January 27, 2023 16:07
@antiguru antiguru requested review from a team, danhhz and vmarcos January 27, 2023 16:07
@danhhz
Copy link
Contributor

danhhz commented Jan 27, 2023

I'm supportive of the overall direction, but the original behavior of yield_fn was a suggestion from @frankmcsherry, so I'd love to get signoff from him on the change in behavior before looking at the rest of this too closely

@danhhz danhhz requested a review from frankmcsherry January 27, 2023 16:12
@antiguru
Copy link
Member Author

I can bring back the yield_fn if needed, there's no fundamental reason that we can't have it in this framework. The reason I removed it is because we never yield on time, only on number of records.

Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to storage folks regarding the details of the operator code; at least upon a comparison of the old and new versions, it seems to me that the logic has been kept unharmed.

Regarding the changes to ConsolidateBuffer, if we now introduce the function give_at, would it make sense to have a variation of new that does not require the output port to be specified, with an explicit Option being passed in? I am a bit worried of misuses where one gives a port in a multi-output operator, but calls give_at by mistake. With some extra checking, the implementation could potentially catch these problems?

This requires a change to the ConsolidateBuffer, which previously only
worked with `CapabilityRef` instead of actual capabilities. The Ref type
can be narrowed down to specific outputs, but owned capabilities are set
to a specific output.

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru force-pushed the persist_source_no_async branch from 6445833 to d5eb6f1 Compare January 30, 2023 19:10
Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru
Copy link
Member Author

I updated the PR to use the yield_fn which was there previously. The PR is now best viewed with whitespace hidden!

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persist_source change LGTM! I didn't look at the buffer.rs change. sorry this has been such a saga!

@antiguru antiguru requested review from teskje and vmarcos January 31, 2023 18:14
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another look at give_at; it was actually fine, so this LGTM!

@antiguru antiguru merged commit 1fdaa6d into MaterializeInc:main Feb 1, 2023
@antiguru antiguru deleted the persist_source_no_async branch February 1, 2023 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants