[SPARK-15639][SQL] Try to push down filter at RowGroups level for parquet reader #13371

viirya · 2016-05-28T05:08:14Z

What changes were proposed in this pull request?

The base class SpecificParquetRecordReaderBase used for vectorized parquet reader will try to get pushed-down filters from the given configuration. This pushed-down filters are used for RowGroups-level filtering. However, we don't set up the filters to push down into the configuration. In other words, the filters are not actually pushed down to do RowGroups-level filtering. This patch is to fix this and tries to set up the filters for pushing down to configuration for the reader.

How was this patch tested?

Existing tests should be passed.

SparkQA · 2016-05-28T05:20:29Z

Test build #59549 has finished for PR 13371 at commit 5687a3b.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-05-28T05:20:58Z

retest this please.

SparkQA · 2016-05-28T06:41:33Z

Test build #59550 has finished for PR 13371 at commit 5687a3b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-05-30T04:13:41Z

cc @nongli @liancheng

viirya · 2016-05-30T04:19:36Z

also cc @yhuai

yhuai · 2016-06-01T04:52:08Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala

        new TaskAttemptContextImpl(broadcastedHadoopConf.value.value, attemptId)

+      // Try to push down filters when filter push-down is enabled.
+      // Notice: This push-down is RowGroups level, not individual records.


Can you provide link to the doc saying it is row group level?

(it is not obvious to know this is just for row group level)

Also, does parquet support row group level predicate evaluation?

We use org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups in SpecificParquetRecordReaderBase to do filtering.

The implementation of RowGroupFilter is at https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java.

From this, Looks like it does filtering.

Besides, as we use the metadata in merged schema to figure out if a field is optional (i.e. not in all parquet files) or not to decide to push down a filter regarding it, this info has been ignored in FileSourceStrategy now. Without the fixing in this change, the push-down row-group level filtering will be failed due to not existing field in parquet file.

yhuai · 2016-06-01T04:52:58Z

Can you provide a test case that shows the problem? Also, can you provide benchmark results of the performance benefit?

viirya · 2016-06-01T04:57:23Z

@yhuai As you can see, this is not to fix a bug/problem. So I think it might be hard to provide a test case for it. I will try to do the benchmark.

viirya · 2016-06-01T04:58:08Z

BTW, I can't see any reason not to add a row-group level filter for parquet.

yhuai · 2016-06-01T05:15:42Z

It is a good idea to add it if parquet supports it (I have an impression that parquet does not support it. But maybe I am wrong). I think having benchmark results is a good practice, so we can avoid it hit any obvious issue.

viirya · 2016-06-02T14:09:38Z

@yhuai I've run a simple benchmark as following:

test("Benchmark for Parquet") {
  val N = 1 << 20

  val benchmark = new Benchmark("Parquet reader", N)
  benchmark.addCase("reading Parquet file", 1) { iter =>
    withParquetTable((0 until N).map(i => (101, i)), "t") {
      sql("SELECT _1 FROM t where t._1 < 100").collect()
    }
  }
  benchmark.run()
}

Before this patch:

Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
reading Parquet file                        34225 / 34225          0.0       32639.5       1.0X

After this patch:

Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
reading Parquet file                        31350 / 31350          0.0       29897.6       1.0X

viirya · 2016-06-03T06:54:54Z

ping @yhuai I've addressed the comments. Please take a look again. Thanks!

viirya · 2016-06-06T23:35:33Z

ping @yhuai again

viirya · 2016-06-08T02:17:04Z

cc @rxin Can you also take a look of this? This is staying for a while too. Thanks!

viirya · 2016-06-08T18:04:29Z

cc @cloud-fan too.

…-push-down-filter Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala

viirya · 2016-06-09T21:20:24Z

ping @yhuai @rxin @cloud-fan

SparkQA · 2016-06-09T22:56:42Z

Test build #60246 has finished for PR 13371 at commit 077f7f8.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-06-09T23:00:30Z

Is this a bug fix or performance fix? Sorry I don't really understand after reading your description.

viirya · 2016-06-09T23:45:34Z

It is not really a bug fix because without this filtering push-down, the thing still works. This should be a performance fix. I should modify the description.

viirya · 2016-06-09T23:50:00Z

retest this please.

viirya · 2016-06-09T23:50:31Z

The description is updated.

SparkQA · 2016-06-10T01:50:06Z

Test build #60256 has finished for PR 13371 at commit 077f7f8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-06-10T04:27:03Z

@viirya I took a look at parquet's code. Seems parquet only evaluate row group level filters when generating splits (https://github.com/apache/parquet-mr/blob/apache-parquet-1.7.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java#L673). With FileSourceStrategy in Spark, I am not sure we will actually evaluate filter unneeded row groups as expected. Can you take a look? Also, it will be great if you can have a test to make sure that we actually can skip unneeded row groups. This test can be created as follows.

We first write a parquet file containing multiple row groups. Also, let's that there is a column c and those row groups have disjoint ranges of c's values.
We write a query having a filter on c and we make sure that this query only need a subset of row groups.
We verify that we only create splits for the needed row groups.

viirya · 2016-06-10T05:36:35Z

@yhuai Parquet also does this filtering at ParquetRecordReader (https://github.com/apache/parquet-mr/blob/apache-parquet-1.7.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java#L160) and ParquetReader(https://github.com/apache/parquet-mr/blob/apache-parquet-1.7.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetReader.java#L147).

In Spark, we also did this at SpecificParquetRecordReaderBase (

spark/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java

Line 103 in f958c1c

blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema);

).

I've manually tested it as I mentioned above. But it should be good to have a formal test case for it as you said. I will try to add it later, maybe when I come back to work few days later...

viirya · 2016-06-10T05:57:08Z

@yhuai Your step 3 may not work. We are going to filter the row groups for each parquet file to read in VectorizedParquetRecordReader. I think we don't do anything regarding creating splits?

liancheng · 2016-06-11T01:23:39Z

@yhuai We used to support row group level filter push-down before refactoring HadoopFsRelation into FileFormat, but lost it (by accident I guess) after the refactoring. So now we only have row group level filtering when the vectorized reader is not used, see here.

And yes, both ParquetInputFormat and ParquetRecordReader do row group level filtering.

This LGTM. Thanks for fixing it! Merging to master and 2.0.

…quet reader ## What changes were proposed in this pull request? The base class `SpecificParquetRecordReaderBase` used for vectorized parquet reader will try to get pushed-down filters from the given configuration. This pushed-down filters are used for RowGroups-level filtering. However, we don't set up the filters to push down into the configuration. In other words, the filters are not actually pushed down to do RowGroups-level filtering. This patch is to fix this and tries to set up the filters for pushing down to configuration for the reader. ## How was this patch tested? Existing tests should be passed. Author: Liang-Chi Hsieh <[email protected]> Closes #13371 from viirya/vectorized-reader-push-down-filter. (cherry picked from commit bba5d79) Signed-off-by: Cheng Lian <[email protected]>

rxin · 2016-06-11T01:45:18Z

I just talked to @liancheng offline. I don't think we should've merged this until we have verified there is no performance regression, and we definitely shouldn't have merged this in 2.0.

@liancheng can you revert this from both master and branch-2.0?

@viirya can you run some parquet scan benchmark and make sure this does not result in perf regression?

rxin · 2016-06-11T01:47:50Z

To be more clear, please write a proper benchmark that reads data when filter push down is not useful to compare whether this regress performance for the non-push-down case. Also make sure the benchmark does not include the time it takes to write the parquet data.

rxin · 2016-06-11T01:49:38Z

And once we have more data, it might make sense to merge this in 2.0!

viirya · 2016-06-11T03:01:13Z

@rxin One thing needs to be explain is, because we just have one configuration to control filter push down, it affects row-based filter push down and this row-group filter push down.

The benchmark I posted above is running it against this patch (so with both two push down) and master branch (only row-based, without this patch) individually. Of course it includes the time to write the parquet data, I will change it. I want to confirm if this kind of benchmark is enough?

liancheng · 2016-06-11T03:47:54Z

Reverted from master and branch-2.0.

@viirya For the benchmark, there are two things:

The benchmark also counts Parquet file writing into it, so the real number should be much better than the posted one.
We should also benchmark for cases where no filters are pushed down to verify that this patch doesn't affect normal code path.

viirya · 2016-06-11T03:51:29Z

@liancheng Got it.

viirya · 2016-06-14T08:24:00Z

@liancheng

I rerun the benchmark that excludes the time of writing Parquet file:

test("Benchmark for Parquet") {
  val N = 1 << 50
    withParquetTable((0 until N).map(i => (101, i)), "t") {
      val benchmark = new Benchmark("Parquet reader", N)
      benchmark.addCase("reading Parquet file", 10) { iter =>
        sql("SELECT _1 FROM t where t._1 < 100").collect()
      }
      benchmark.run()
  }
}

withParquetTable in default will run tests for vectorized reader non-vectorized readers. I only let it run vectorized reader.

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17 on Linux 3.13.0-57-generic
Westmere E56xx/L56xx/X56xx (Nehalem-C)
Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
reading Parquet file                            76 /   88          3.4         291.0       1.0X

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17 on Linux 3.13.0-57-generic
Westmere E56xx/L56xx/X56xx (Nehalem-C)
Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
reading Parquet file                            81 /   91          3.2         310.2       1.0X

Next, I run the benchmark for non-pushdown case using the same benchmark code but with disabled pushdown configuration.

After this patch:

Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
reading Parquet file                            80 /   95          3.3         306.5       1.0X

Before this patch:

Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
reading Parquet file                            80 /  103          3.3         306.7       1.0X

For non-pushdown case, from the results, I think this patch doesn't affect normal code path.

yhuai · 2016-06-14T16:02:05Z

Can you add results showing that there are skipped row groups with this change (and before this patch all row groups are loaded)?

For those results, let's also put them in the description of the new PR.

viirya · 2016-06-16T02:39:01Z

@yhuai ok. Do you mean I need to create a new PR for this?

yhuai · 2016-06-16T03:50:18Z

Yea. Since this one was closed by asfgit, I am not sure you can reopen it.

On Wed, Jun 15, 2016 at 7:39 PM -0700, "Liang-Chi Hsieh" [email protected] wrote:

@yhuai ok. Do you mean I need to create a new PR for this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

liancheng · 2016-06-16T21:42:45Z

@viirya One problem in your new benchmark code is that 1 << 50 is actually very small since it's an Int:

scala> 1 << 50
res0: Int = 262144

Anyway, 1 << 50, which is 1PB, might be too large a value for such a microbenchmark :)

So the generated Parquet file probably only contains a single row group, I guess that's why the numbers are so close to each other no matter you enable row group filter push-down or not.

viirya · 2016-06-17T02:19:54Z

@liancheng Thanks! I didn't notice that. I will rerun the benchmark. I've re-submitted this PR at #13701.

Try to push down filter at RowGroups level for parquet reader.

5687a3b

yhuai reviewed Jun 1, 2016
View reviewed changes

viirya mentioned this pull request Jun 8, 2016

[SPARK-15804][SQL]Include metadata in the toStructType #13555

Closed

Merge remote-tracking branch 'upstream/master' into vectorized-reader…

077f7f8

…-push-down-filter Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala

asfgit closed this in bba5d79 Jun 11, 2016

viirya deleted the vectorized-reader-push-down-filter branch December 27, 2023 18:33

[SPARK-15639][SQL] Try to push down filter at RowGroups level for parquet reader #13371

[SPARK-15639][SQL] Try to push down filter at RowGroups level for parquet reader #13371

Uh oh!

Conversation

viirya commented May 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 28, 2016

Uh oh!

viirya commented May 28, 2016

Uh oh!

SparkQA commented May 28, 2016

Uh oh!

viirya commented May 30, 2016

Uh oh!

viirya commented May 30, 2016

Uh oh!

yhuai Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

viirya Jun 2, 2016

Choose a reason for hiding this comment

Uh oh!

viirya Jun 8, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai commented Jun 1, 2016

Uh oh!

viirya commented Jun 1, 2016

Uh oh!

viirya commented Jun 1, 2016

Uh oh!

yhuai commented Jun 1, 2016

Uh oh!

viirya commented Jun 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Jun 3, 2016

Uh oh!

viirya commented Jun 6, 2016

Uh oh!

viirya commented Jun 8, 2016

Uh oh!

viirya commented Jun 8, 2016

Uh oh!

viirya commented Jun 9, 2016

Uh oh!

SparkQA commented Jun 9, 2016

Uh oh!

rxin commented Jun 9, 2016

Uh oh!

viirya commented Jun 9, 2016

Uh oh!

viirya commented Jun 9, 2016

Uh oh!

viirya commented Jun 9, 2016

Uh oh!

SparkQA commented Jun 10, 2016

Uh oh!

yhuai commented Jun 10, 2016

Uh oh!

viirya commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Jun 10, 2016

Uh oh!

liancheng commented Jun 11, 2016

Uh oh!

rxin commented Jun 11, 2016

Uh oh!

rxin commented Jun 11, 2016

Uh oh!

viirya commented May 28, 2016 •

edited

Loading

viirya commented Jun 2, 2016 •

edited

Loading

viirya commented Jun 10, 2016 •

edited

Loading

viirya commented Jun 11, 2016 •

edited

Loading

liancheng commented Jun 16, 2016 •

edited

Loading