Skip to content

Conversation

@aarondav
Copy link
Contributor

@aarondav aarondav commented Apr 8, 2014

This was added to the check for the assembly jar, forgot it for the datanucleus jars.

…ot found

Redirecting stderr to /dev/null/ can't be that much slower than an if [ -z ]...
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13898/

@asfgit asfgit closed this in e25b593 Apr 8, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…ot found

This was added to the check for the assembly jar, forgot it for the datanucleus jars.

Author: Aaron Davidson <[email protected]>

Closes apache#361 from aarondav/cc and squashes the following commits:

8facc16 [Aaron Davidson] SPARK-1445: compute-classpath should not print error if lib_managed not found
tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Jul 21, 2017
This commit tries to solve issue apache#359 by allowing the `spark.executor.cores` configuration key to take fractional values, e.g., 0.5 or 1.5. The value is used to specify the cpu request when creating the executor pods, which is allowed to be fractional by Kubernetes. When the value is passed to the executor process through the environment variable `SPARK_EXECUTOR_CORES`, the value is rounded up to the closest integer as required by the `CoarseGrainedExecutorBackend`.

Signed-off-by: Yinan Li <[email protected]>
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
This commit tries to solve issue apache#359 by allowing the `spark.executor.cores` configuration key to take fractional values, e.g., 0.5 or 1.5. The value is used to specify the cpu request when creating the executor pods, which is allowed to be fractional by Kubernetes. When the value is passed to the executor process through the environment variable `SPARK_EXECUTOR_CORES`, the value is rounded up to the closest integer as required by the `CoarseGrainedExecutorBackend`.

Signed-off-by: Yinan Li <[email protected]>
mccheah added a commit to mccheah/spark that referenced this pull request Oct 3, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
This Change refactor the job of running tests of osb-checker against
huaweicloud, because the osb-checker has so some refactor.

Closes: theopenlab/openlab#90
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Mar 18, 2022
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…t overflow (apache#361)

This is a cherry-pick from apache#44006 to spark 3.5

### What changes were proposed in this pull request?
This change adds a check for overflows when creating Parquet row group filters on an INT32 (byte/short/int) parquet type to avoid incorrectly skipping row groups if the predicate value doesn't fit in an INT. This can happen if the read schema is specified as LONG, e.g via `.schema("col LONG")`
While the Parquet readers don't support reading INT32 into a LONG, the overflow can lead to row groups being incorrectly skipped, bypassing the reader altogether and producing incorrect results instead of failing.

### Why are the changes needed?
Reading a parquet file containing INT32 values with a read schema specified as LONG can produce incorrect results today:
```
Seq(0).toDF("a").write.parquet(path)
spark.read.schema("a LONG").parquet(path).where(s"a < ${Long.MaxValue}").collect()
```
will return an empty result. The correct result is either:
- Failing the query if the parquet reader doesn't support upcasting integers to longs (all parquet readers in Spark today)
- Return result `[0]` if the parquet reader supports that upcast (no readers in Spark as of now, but I'm looking into adding this capability).

### Does this PR introduce _any_ user-facing change?
The following:
```
Seq(0).toDF("a").write.parquet(path)
spark.read.schema("a LONG").parquet(path).where(s"a < ${Long.MaxValue}").collect()
```
produces an (incorrect) empty result before this change. After this change, the read will fail, raising an error about the unsupported conversion from INT to LONG in the parquet reader.

### How was this patch tested?
- Added tests to `ParquetFilterSuite` to ensure that no row group filter is created when the predicate value overflows or when the value type isn't compatible with the parquet type
- Added test to `ParquetQuerySuite` covering the correctness issue described above.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#44154 from johanl-db/SPARK-46092-row-group-skipping-overflow-3.5.

Authored-by: Johan Lasperas <[email protected]>

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Johan Lasperas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants