Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 20, 2020

What changes were proposed in this pull request?

In the PR, I propose to refactor reading of timestamps of the TIMESTAMP_MILLIS logical type from Parquet files in VectorizedColumnReader, and move checking of the rebaseDateTime flag out of the internal loop.

Why are the changes needed?

To avoid any additional overhead of the checking the SQL config spark.sql.legacy.parquet.rebaseDateTime.enabled introduced by the PR #27915.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the test suite ParquetIOSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 20, 2020

ping @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Mar 20, 2020

Test build #120115 has finished for PR 27973 at commit 5012060.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

long micros = DateTimeUtils.millisToMicros(dataColumn.readLong());
if (rebaseDateTime) {
micros = DateTimeUtils.rebaseJulianToGregorianMicros(micros);
if (rebaseDateTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of neutral but a bit of dups vs a bit of perf :-). I guess it's fine.

HyukjinKwon pushed a commit that referenced this pull request Mar 23, 2020
…ag out of the loop in `VectorizedColumnReader`

In the PR, I propose to refactor reading of timestamps of the `TIMESTAMP_MILLIS` logical type from Parquet files in `VectorizedColumnReader`, and move checking of the `rebaseDateTime` flag out of the internal loop.

To avoid any additional overhead of the checking the SQL config `spark.sql.legacy.parquet.rebaseDateTime.enabled` introduced by the PR #27915.

No

By running the test suite `ParquetIOSuite`.

Closes #27973 from MaxGekk/rebase-parquet-datetime-followup.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit aa3a742)
Signed-off-by: HyukjinKwon <[email protected]>
@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ag out of the loop in `VectorizedColumnReader`

### What changes were proposed in this pull request?
In the PR, I propose to refactor reading of timestamps of the `TIMESTAMP_MILLIS` logical type from Parquet files in `VectorizedColumnReader`, and move checking of the `rebaseDateTime` flag out of the internal loop.

### Why are the changes needed?
To avoid any additional overhead of the checking the SQL config `spark.sql.legacy.parquet.rebaseDateTime.enabled` introduced by the PR apache#27915.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By running the test suite `ParquetIOSuite`.

Closes apache#27973 from MaxGekk/rebase-parquet-datetime-followup.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@MaxGekk MaxGekk deleted the rebase-parquet-datetime-followup branch June 5, 2020 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants