Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch attempts to fix an issue where Spark SQL's UnsafeRowSerializer was incompatible with the tungsten-sort ShuffleManager.

@JoshRosen
Copy link
Contributor Author

Not 100% sure if this is sufficient for a proper fix.

@SparkQA
Copy link

SparkQA commented Sep 22, 2015

Test build #42865 has finished for PR 8873 at commit e6fdd46.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class WriterThread(

@JoshRosen
Copy link
Contributor Author

@davies, I was puzzled about why UnsafeRowSerializer didn't fail with similar exceptions when re-ordering serialized rows using the PartitionedSerializedPairBuffer. It turns out that the ExternalSorter's serialized shuffle sort path ends up using a DiskBlockObjectWriter in such a way that the serializer instance's close() method is called when closing the writer. The UnsafeShuffleWriter, on the other hand, uses a different write path which does not do this, leading to the error that you saw.

@davies
Copy link
Contributor

davies commented Sep 23, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42867 has finished for PR 8873 at commit 5419ca4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Merging to master and 1.5

asfgit pushed a commit that referenced this pull request Sep 23, 2015
…uffleManager

This patch attempts to fix an issue where Spark SQL's UnsafeRowSerializer was incompatible with the `tungsten-sort` ShuffleManager.

Author: Josh Rosen <[email protected]>

Closes #8873 from JoshRosen/SPARK-10403.

(cherry picked from commit a182080)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in a182080 Sep 23, 2015
@JoshRosen JoshRosen deleted the SPARK-10403 branch October 7, 2015 22:58
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…uffleManager

This patch attempts to fix an issue where Spark SQL's UnsafeRowSerializer was incompatible with the `tungsten-sort` ShuffleManager.

Author: Josh Rosen <[email protected]>

Closes apache#8873 from JoshRosen/SPARK-10403.

(cherry picked from commit a182080)
Signed-off-by: Michael Armbrust <[email protected]>
(cherry picked from commit 64cc62c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants