-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8850] [SQL] Enable Unsafe mode by default #7564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Found an interesting problem where the "exception on memory leak" configuration can lead to task failure exceptions being masked. Going to push a fix to this branch, which I may end up spinning off into a separate PR depending on how things go. |
|
Test build #37925 has finished for PR 7564 at commit
|
|
Looks like there’s a problem with using ScalaUDFs in the unsafe GeneratedAggregate path. It looks like there’s a case where we end up using a BoundReference expression to extract fields before passing them to the UDF. This ends up calling the generic Row.apply(), which returns data of the wrong type when called on an UnsafeRow (specifically, the failing test tries to get a Long column but ends up getting a byte array back). One way to fix this would be to implement code generation for ScalaUDF so that the field access expressions are code generated. Edit: this is covered by https://issues.apache.org/jira/browse/SPARK-9162. |
|
Test build #37928 timed out for PR 7564 at commit |
|
Test build #37931 timed out for PR 7564 at commit |
|
In this latest round of tests, it looks like HiveThriftBinaryServerSuite failed a test: It looks like there may also be a planner bug that occurs when This is probably easy to fix by adding configuration validation which does not allow Finally, it looks like a number of the HiveCompatibilitySuite tests are failing due to the minimum buffer size issue, which I'll try to fix today. |
e872840 to
e71ddbc
Compare
|
Test build #37973 timed out for PR 7564 at commit |
|
I lowered some buffers to 4MB so that the majority of HiveCompatibilitySuite runs and have found a few other test failures; see updated ticket description for more details. |
|
Test build #37987 timed out for PR 7564 at commit |
|
Test build #37990 timed out for PR 7564 at commit |
5464206 to
4fcae4a
Compare
|
Test build #38482 timed out for PR 7564 at commit |
|
Test build #38496 has finished for PR 7564 at commit
|
88aa54a to
c38bcdd
Compare
|
Test build #38514 timed out for PR 7564 at commit |
|
Test build #38519 timed out for PR 7564 at commit |
c38bcdd to
393a3eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unnecessary now that we support generic getters in UnsafeRows.
|
Test build #38563 timed out for PR 7564 at commit |
|
Test build #38638 has finished for PR 7564 at commit
|
|
Lots of changes are temporarily merged in here while I test on top of Reynold's struct type patch. |
a2ccefa to
8946cb9
Compare
|
Test build #38667 has finished for PR 7564 at commit
|
|
Test build #38677 has finished for PR 7564 at commit
|
b4ab1c0 to
445b4db
Compare
|
Test build #38757 has finished for PR 7564 at commit
|
|
jenkins, test this please |
|
Test build #38761 has finished for PR 7564 at commit
|
4f92106 to
5d0b2d3
Compare
|
Test build #38896 has finished for PR 7564 at commit
|
|
Nooo! This failed a PySpark test because I need to configure a lower default page size in those tests. |
|
Test build #38910 has finished for PR 7564 at commit
|
|
Jenkins, retest this please. |
|
(Re-testing because a flaky MLlib test may have caused a spurious failure and I want to have the Python tests run...) |
|
Test build #38940 has finished for PR 7564 at commit
|
|
Test build #38945 has finished for PR 7564 at commit
|
|
Test build #1231 has finished for PR 7564 at commit
|
|
Test build #38965 has finished for PR 7564 at commit
|
|
Test build #38992 has finished for PR 7564 at commit
|
|
Jenkins, retest this please. (Just preemptively in case the first build hits a flaky streaming test) |
|
Test build #39056 has finished for PR 7564 at commit
|
|
Jenkins retest this please |
|
I've merged this! |
|
Test build #1236 has finished for PR 7564 at commit
|
|
Test build #39073 has finished for PR 7564 at commit
|
This pull request enables Unsafe mode by default in Spark SQL. In order to do this, we had to fix a number of small issues:
List of fixed blockers:
https://issues.apache.org/jira/browse/SPARK-9162, to implement code generation for ScalaUDF. This is necessary forThis is no longer necessary as of [SPARK-9368][SQL] Support get(ordinal, dataType) generic getter in UnsafeRow. #7682.UDFSuiteto pass. For now, I've just ignored this test in order to try to find other problems while we wait for a fix.AggregationQuerySuiteare failing due to NaN-handling issues in UnsafeRow, which were fixed in [SPARK-9421] Fix null-handling bugs in UnsafeRow.getDouble, getFloat(), and get(ordinal, dataType) #7736.org.apache.spark.sql.ColumnExpressionSuite.randneeds to be updated so that the planner check also matchesTungstenProject.join_1to1(fixed by [SPARK-9364] Fix array out of bounds and use-after-free bugs in UnsafeExternalSorter #7680)join_nulls(fixed by [SPARK-9364] Fix array out of bounds and use-after-free bugs in UnsafeExternalSorter #7680)lateral_viewpartcols1. This might be a deadlock in script transformation or a bug in error-handling code? The hang was fixed by [SPARK-9393] [SQL] Fix several error-handling bugs in ScriptTransform operator #7710.partcols1: will be fixed by [SPARK-9419] ShuffleMemoryManager and MemoryStore should track memory on a per-task, not per-thread, basis #7734.partcols1hang, it appears that a number of later tests have issues as well.