Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 25, 2020

What changes were proposed in this pull request?

This PR aims to speed up MapStatus deserialization by 5~18% with the latest RoaringBitmap 0.9.0 and new APIs. Note that we focus on deserialization time because serialization occurs once while deserialization occurs many times.

Why are the changes needed?

The current version is too old. We had better upgrade it to get the performance improvement and bug fixes.
Although MapStatusesSerDeserBenchmark is synthetic, the benchmark result is updated with this patch.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the Jenkins or GitHub Action.

Deserialization 530 535 9 0.4 2651.1 0.3X
-------------------------------------------------------------------------------------------------------------------------
Serialization 175 183 12 1.1 874.1 1.0X
Deserialization 458 462 6 0.4 2288.6 0.4X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14% reduced.

Deserialization 495 588 79 0.4 2476.7 0.3X
--------------------------------------------------------------------------------------------------------------------------
Serialization 160 171 8 1.2 801.1 1.0X
Deserialization 453 484 38 0.4 2263.4 0.4X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18% reduced.

Deserialization 946 977 33 0.2 4730.2 1.8X
---------------------------------------------------------------------------------------------------------------------------
Serialization 1641 1819 252 0.1 8204.1 1.0X
Deserialization 844 882 37 0.2 4219.7 1.9X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10% reduction.

Deserialization 929 941 19 0.2 4645.5 1.5X
----------------------------------------------------------------------------------------------------------------------------
Serialization 1360 1412 73 0.1 6799.3 1.0X
Deserialization 850 859 13 0.2 4249.9 1.6X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9% reduction.

Deserialization 943 970 32 0.2 4715.8 1.8X
---------------------------------------------------------------------------------------------------------------------------
Serialization 1740 1903 231 0.1 8700.0 1.0X
Deserialization 872 888 24 0.2 4360.9 2.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8% reduction.

Deserialization 940 970 37 0.2 4699.1 1.5X
----------------------------------------------------------------------------------------------------------------------------
Serialization 1461 1469 11 0.1 7306.1 1.0X
Deserialization 871 889 22 0.2 4353.9 1.7X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8% reduction.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 25, 2020

cc FYI, @HyukjinKwon since GitHub Action is not triggered here while the very next PR gets GitHub Action.

@SparkQA
Copy link

SparkQA commented Jul 25, 2020

Test build #126538 has finished for PR 29233 at commit f784f2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good if it's a simple update and perf win.

@dongjoon-hyun
Copy link
Member Author

Thank you, @srowen ! Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-ROAR branch July 25, 2020 15:19
@dongjoon-hyun
Copy link
Member Author

Oops.. I missed the dependency update and Jenkins passed without a dependency test. :(
I'll make a hotfix commit.

@HyukjinKwon
Copy link
Member

LGTM except ^.

holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…ingBitmap 0.9.0

### What changes were proposed in this pull request?

This PR aims to speed up `MapStatus` deserialization by 5~18% with the latest RoaringBitmap `0.9.0` and new APIs. Note that we focus on `deserialization` time because `serialization` occurs once while `deserialization` occurs many times.

### Why are the changes needed?

The current version is too old. We had better upgrade it to get the performance improvement and bug fixes.
Although `MapStatusesSerDeserBenchmark` is synthetic, the benchmark result is updated with this patch.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins or GitHub Action.

Closes apache#29233 from dongjoon-hyun/SPARK-ROAR.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit f642234)
Signed-off-by: Dongjoon Hyun <[email protected]>
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…ingBitmap 0.9.0

This PR aims to speed up `MapStatus` deserialization by 5~18% with the latest RoaringBitmap `0.9.0` and new APIs. Note that we focus on `deserialization` time because `serialization` occurs once while `deserialization` occurs many times.

The current version is too old. We had better upgrade it to get the performance improvement and bug fixes.
Although `MapStatusesSerDeserBenchmark` is synthetic, the benchmark result is updated with this patch.

No.

Pass the Jenkins or GitHub Action.

Closes apache#29233 from dongjoon-hyun/SPARK-ROAR.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

Ref: LIHADOOP-56788

RB=2401504
BUG=LIHADOOP-56788
G=spark-reviewers
R=vsowrira,minyang,chsingh,yezhou,mmuralid
A=chsingh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants