Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 29, 2016

What changes were proposed in this pull request?

It seems Parquet has been upgraded to 1.8.1 by #13280. So, this PR enables string and binary predicate push down which was disabled due to SPARK-11153 and PARQUET-251 and cleans up some comments unremoved (I think by mistake).

This PR also replace the API, fromByteArray() deprecated in PARQUET-251.

How was this patch tested?

Unit tests in ParquetFilters

@SparkQA
Copy link

SparkQA commented May 29, 2016

Test build #59580 has finished for PR 13389 at commit 04f24e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Hi @rdblue @liancheng , Could you please take a look?

(n: String, v: Any) => FilterApi.eq(floatColumn(n), v.asInstanceOf[java.lang.Float])
case DoubleType =>
(n: String, v: Any) => FilterApi.eq(doubleColumn(n), v.asInstanceOf[java.lang.Double])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Lots of unnecessary white-space changes. I don't generally like to commit these, but it's fine if this is more in line with the project's published style guidelines.

@rdblue
Copy link
Contributor

rdblue commented May 30, 2016

+1 overall, good catch on those tests.

@HyukjinKwon HyukjinKwon changed the title [SPARK-9876][SQL][FOLLOWUP] Enable string and binary tests for Parquet predicate pushdown [SPARK-9876][SQL][FOLLOWUP] Enable string and binary tests for Parquet predicate pushdown and replace deprecated fromByteArray. Jun 7, 2016
@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 7, 2016

@rdblue Thank you for the review! I just noticed fromByteArray was deprecated so I replaced them to the new ones. Do you mind if I ask a quick look for this again? I saw this was reviewed by you in parquet-mr.

@SparkQA
Copy link

SparkQA commented Jun 7, 2016

Test build #60124 has finished for PR 13389 at commit 5611978.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 30, 2016

@rdblue @liancheng Could you please take another look?

(row: SpecializedGetters, ordinal: Int) =>
recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes))
recordConsumer.addBinary(
Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're converting from a String, you can use Binary.fromString, which sets the reuse flag correctly. It looks like this should be using the "constant" variant, which signals to Parquet that the underlying bytes won't be changed by the program after they have been passed to Parquet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review! (Actually it is UTF8String. So, it has to be converted into String to use Binary.fromString).. though.. I am a bit worried that it might possibly be reused in the future (although I think it is not reused for now).

This can write corrupt statistics if this is reused.. Is my understanding correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF8String itself is immutable, but the underlying buffer it points to can be mutable. I'd vote for using Binary.fromReusedByteArray here.

@rdblue
Copy link
Contributor

rdblue commented Jun 30, 2016

Looks fine other than one comment.

@liancheng
Copy link
Contributor

LGTM, merging to master. Thanks!

@asfgit asfgit closed this in 07d9c53 Jul 5, 2016
@HyukjinKwon HyukjinKwon deleted the parquet-1.8-followup branch January 2, 2018 03:42
dongjoon-hyun pushed a commit that referenced this pull request Jun 28, 2023
### What changes were proposed in this pull request?
This pr aims to upgrade netty from 4.1.92 to 4.1.93.

### Why are the changes needed?
1.v4.1.92 VS v4.1.93
netty/netty@netty-4.1.92.Final...netty-4.1.93.Final

2.The new version brings some bug fix, eg:
- Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320))
- OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365))
- Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366))
- HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368))
- Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375))
- H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388))
- Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389))
- HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394))
- Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399))

3.The release notes as follows:
- https://netty.io/news/2023/05/25/4-1-93-Final.html

4.Why not upgrade to `4-1-94-Final` version?
Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version.

The reference is as follows:
https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164
https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA.

Closes #41681 from panbingkun/upgrade_netty.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants