-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-9876][SQL][FOLLOWUP] Enable string and binary tests for Parquet predicate pushdown and replace deprecated fromByteArray. #13389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #59580 has finished for PR 13389 at commit
|
|
Hi @rdblue @liancheng , Could you please take a look? |
| (n: String, v: Any) => FilterApi.eq(floatColumn(n), v.asInstanceOf[java.lang.Float]) | ||
| case DoubleType => | ||
| (n: String, v: Any) => FilterApi.eq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Lots of unnecessary white-space changes. I don't generally like to commit these, but it's fine if this is more in line with the project's published style guidelines.
|
+1 overall, good catch on those tests. |
|
@rdblue Thank you for the review! I just noticed |
|
Test build #60124 has finished for PR 13389 at commit
|
|
@rdblue @liancheng Could you please take another look? |
| (row: SpecializedGetters, ordinal: Int) => | ||
| recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes)) | ||
| recordConsumer.addBinary( | ||
| Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're converting from a String, you can use Binary.fromString, which sets the reuse flag correctly. It looks like this should be using the "constant" variant, which signals to Parquet that the underlying bytes won't be changed by the program after they have been passed to Parquet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review! (Actually it is UTF8String. So, it has to be converted into String to use Binary.fromString).. though.. I am a bit worried that it might possibly be reused in the future (although I think it is not reused for now).
This can write corrupt statistics if this is reused.. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF8String itself is immutable, but the underlying buffer it points to can be mutable. I'd vote for using Binary.fromReusedByteArray here.
|
Looks fine other than one comment. |
|
LGTM, merging to master. Thanks! |
### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
It seems Parquet has been upgraded to 1.8.1 by #13280. So, this PR enables string and binary predicate push down which was disabled due to SPARK-11153 and PARQUET-251 and cleans up some comments unremoved (I think by mistake).
This PR also replace the API,
fromByteArray()deprecated in PARQUET-251.How was this patch tested?
Unit tests in
ParquetFilters