Skip to content

Conversation

@mridulm
Copy link
Contributor

@mridulm mridulm commented Apr 23, 2014

Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.

@mridulm mridulm changed the title Windows fixes SPARK-1586 Windows build fixes Apr 23, 2014
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14370/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14372/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14374/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14375/

@mridulm
Copy link
Contributor Author

mridulm commented Apr 23, 2014

I am not sure why this has failed - since it works locally on both linux and windows for me.
Not sure why StringEscapeUtils.escapeJava is failing for this .. let me try to investigate

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unix2dos change.
Plus, earlier version was not working (how it sets variable, etc).

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14377/

@mridulm
Copy link
Contributor Author

mridulm commented Apr 23, 2014

CC @mateiz @rxin a lot of these changes have to do with incorrect string -> byte conversions.
Plus the classpath computation fix.

Please note that this is not sufficient to get spark building on windows - due to more hive related failures. Unfortunately, I gave up trying to patch hive related issues.

@mateiz
Copy link
Contributor

mateiz commented Apr 23, 2014

Cool, thanks a lot for taking a look at this! What are the Hive related failures, is Hive failing during build, or during execution?

@mateiz
Copy link
Contributor

mateiz commented Apr 23, 2014

CCing @marmbrus for SQL-related changes

@marmbrus
Copy link
Contributor

These changes seem pretty reasonable to me. I'm also curious what is still wrong with Hive?

Minor question: Do we want to make the string encoding configurable, perhaps with "utf-8" as the default? or is utf-8 always the right answer?

@mridulm
Copy link
Contributor Author

mridulm commented Apr 23, 2014

The failures are while executing testcases from hive project in windows.
Mostly paths getting mangled and so on - I do have cygwin, so that is not the issue (which is required for some of core test suites too btw).

@marmbrus
Copy link
Contributor

Hmm, I see. That could be somewhat difficult to fix as Hive hard codes a lot of these things into their test cases. I would guess they also can't run on windows.

It seems like even without resolving those test failures this PR is still useful.

@mateiz
Copy link
Contributor

mateiz commented Apr 23, 2014

Okay, I'd be fine with the Hive tests not working on Windows. I guess we should test Spark SQL itself on it manually though.

@marmbrus
Copy link
Contributor

One question: The spark sql core tests are passing though right? sbt/sbt sql/test

@mridulm
Copy link
Contributor Author

mridulm commented Apr 24, 2014

Quite a lot of them still fail - all failures now are from sql/hive - and all of them are path related issues iirc.
I am not sure if they were due to the paths hardcoded within the *.q files, etc or there are some other issues with how paths are resolved (like the path change in TestHive from the pr).

@mateiz
Copy link
Contributor

mateiz commented Apr 25, 2014

Hey Mridul, I've merged this in as is for now. We can create another JIRA for making the Hive tests work there, but I'd say it's low priority (presumably they don't work when you develop Hive itself either).

@asfgit asfgit closed this in 968c018 Apr 25, 2014
asfgit pushed a commit that referenced this pull request Apr 25, 2014
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.

Author: Mridul Muralidharan <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <[email protected]>

Closes #505 from mridulm/windows_fixes and squashes the following commits:

ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows

(cherry picked from commit 968c018)
Signed-off-by: Matei Zaharia <[email protected]>
@mridulm
Copy link
Contributor Author

mridulm commented Apr 25, 2014

sounds good, thanks !
i was testing windows to ensure there are no encoding/etc issues with the 2G patch actually - and this is spin off of that :-)

pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
Deprecate mapPartitionsWithSplit in PySpark (SPARK-1026)

This commit deprecates `mapPartitionsWithSplit` in PySpark (see [SPARK-1026](https://spark-project.atlassian.net/browse/SPARK-1026) and removes the remaining references to it from the docs.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.

Author: Mridul Muralidharan <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <[email protected]>

Closes apache#505 from mridulm/windows_fixes and squashes the following commits:

ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
Deprecate mapPartitionsWithSplit in PySpark (SPARK-1026)

This commit deprecates `mapPartitionsWithSplit` in PySpark (see [SPARK-1026](https://spark-project.atlassian.net/browse/SPARK-1026) and removes the remaining references to it from the docs.
(cherry picked from commit 05be704)

Signed-off-by: Patrick Wendell <[email protected]>
yifeih pushed a commit to yifeih/spark that referenced this pull request Mar 5, 2019
…apache#504)" (apache#505)

Reverts palantir#504

This pr is not necessary. After more debugging it looks like it's not possible to replicate `yyyy-MM-dd HH:mm:ss.S` SDF format in java.time.
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Currently periodic logs are going to pr-logs directory in bucket vs periodic-4/16
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 15, 2022
…LIMIT (apache#505)

* [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF

### What changes were proposed in this pull request?
Currently, Spark DS V2 push-down framework supports push down SQL to data sources.
But the DS V2 push-down framework only support push down the built-in functions to data sources.
Each database have a lot very useful functions which not supported by Spark.
If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases.

### Why are the changes needed?

1. Spark can leverage the functions supported by databases
2. Improve the query performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes apache#36593 from beliefer/SPARK-39139.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful

### What changes were proposed in this pull request?
apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI).
But he `Rand` in test case looks no meaningful.

### Why are the changes needed?
Let `Rand` in filter is more meaningful.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test case.

### How was this patch tested?
Just update test case.

Closes apache#37033 from beliefer/SPARK-39453_followup.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

### What changes were proposed in this pull request?
apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect.
Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT.
So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT.

### Why are the changes needed?
Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Bug will be fix.

### How was this patch tested?
New test cases.

Closes apache#37090 from beliefer/SPARK-37527_followup2.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-39627][SQL] DS V2 pushdown should unify the compile API

### What changes were proposed in this pull request?
Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them.

### Why are the changes needed?
Improve ease of use.

### Does this PR introduce _any_ user-facing change?
'No'.
The two API `compileAggregate` call `compileExpression` not changed.

### How was this patch tested?
N/A

Closes apache#37047 from beliefer/SPARK-39627.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect

### What changes were proposed in this pull request?
Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions.
Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect.

### Why are the changes needed?
Make build-in JDBC dialect support compile linear regression aggregate push-down.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#37188 from beliefer/SPARK-39384.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

* [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT

### What changes were proposed in this pull request?

This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg.

The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary.

### Why are the changes needed?

support pushing down LIMIT/OFFSET after agg.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

updated tests

Closes apache#37195 from cloud-fan/agg.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

Co-authored-by: Jiaan Geng <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 15, 2022
…LIMIT (apache#505)

* [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF

### What changes were proposed in this pull request?
Currently, Spark DS V2 push-down framework supports push down SQL to data sources.
But the DS V2 push-down framework only support push down the built-in functions to data sources.
Each database have a lot very useful functions which not supported by Spark.
If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases.

### Why are the changes needed?

1. Spark can leverage the functions supported by databases
2. Improve the query performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes apache#36593 from beliefer/SPARK-39139.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful

### What changes were proposed in this pull request?
apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI).
But he `Rand` in test case looks no meaningful.

### Why are the changes needed?
Let `Rand` in filter is more meaningful.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test case.

### How was this patch tested?
Just update test case.

Closes apache#37033 from beliefer/SPARK-39453_followup.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`

### What changes were proposed in this pull request?
apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect.
Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT.
So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT.

### Why are the changes needed?
Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Bug will be fix.

### How was this patch tested?
New test cases.

Closes apache#37090 from beliefer/SPARK-37527_followup2.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-39627][SQL] DS V2 pushdown should unify the compile API

### What changes were proposed in this pull request?
Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them.

### Why are the changes needed?
Improve ease of use.

### Does this PR introduce _any_ user-facing change?
'No'.
The two API `compileAggregate` call `compileExpression` not changed.

### How was this patch tested?
N/A

Closes apache#37047 from beliefer/SPARK-39627.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect

### What changes were proposed in this pull request?
Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions.
Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect.

### Why are the changes needed?
Make build-in JDBC dialect support compile linear regression aggregate push-down.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#37188 from beliefer/SPARK-39384.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

* [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT

### What changes were proposed in this pull request?

This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg.

The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary.

### Why are the changes needed?

support pushing down LIMIT/OFFSET after agg.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

updated tests

Closes apache#37195 from cloud-fan/agg.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

Co-authored-by: Jiaan Geng <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
* Create log4j2.properties

* Update log4j2.properties

* Update log4j2.properties

* Update log4j2.properties
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants