[pull] master from apache:master #11

pull · 2022-09-11T13:39:05Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

…dencies to use stable version ### What changes were proposed in this pull request? This pr update `scalatest` and `scalatestplus` related dependencies to use stable version as follows: - Upgrade `scala-xml` from 1.20 to `2.1.0` to support scalatest 3.2.13 - Update `org.scalatest:scalatest` from 3.3.0-SNAP3 to 3.2.13 - Update `org.scalatestplus:scalacheck-1-15:3.3.0-SNAP3` to `org.scalatestplus:scalacheck-1-16:3.2.13.0` and upgrade `scalacheck` from 1.15.4 to 1.16.0 - Update `org.scalatestplus:selenium-3-141:3.3.0.0-SNAP3` to `org.scalatestplus:selenium-3-141:3.2.10.0` and left TODO of SPARK-40397. ### Why are the changes needed? Change to use stable version dependencies. The relevant release notes as follows: - scala-xml: - https://github.com/scala/scala-xml/releases/tag/v1.3.0 - https://github.com/scala/scala-xml/releases/tag/v2.0.0 - https://github.com/scala/scala-xml/releases/tag/v2.0.1 - https://github.com/scala/scala-xml/releases/tag/v2.1.0 - scalatest : - https://github.com/scalatest/scalatest/releases/tag/release-3.2.10 - https://github.com/scalatest/scalatest/releases/tag/release-3.2.11 - https://github.com/scalatest/scalatest/releases/tag/release-3.2.12 - https://github.com/scalatest/scalatest/releases/tag/release-3.2.13 - org.scalatestplus:scalacheck: - https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.10.0-for-scalacheck-1.15 - https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.11.0-for-scalacheck-1.15 - https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.12.0-for-scalacheck-1.16 - https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.13.0-for-scalacheck-1.16 - org.scalatestplus:mockito: - https://github.com/scalatest/scalatestplus-mockito/releases/tag/release-3.2.13.0-for-mockito-4.6 - org.scalatestplus:selenium: - https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.10.0-for-selenium-3.141 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manually check the problems mentioned in https://github.com/apache/spark/pull/35128/files ``` SPARK_GENERATE_GOLDEN_FILES=1 build/sbt clean "sql/testOnly *PlanStability*Suite" [info] Run completed in 52 seconds, 520 milliseconds. [info] Total number of tests run: 334 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 334, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` Closes #37842 from LuciferYang/SPARK-40396. Authored-by: yangjie01 <[email protected]> Signed-off-by: Max Gekk <[email protected]>

…n properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#45214 from zhengruifeng/connect_fix_read_join. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.5 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46291 from zhengruifeng/connect_fix_read_join_35. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query #11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query #66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.4 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46290 from zhengruifeng/connect_fix_read_join_34. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…e` building ### What changes were proposed in this pull request? This PR aims to add `libwebp-dev` to recover `spark-rm/Dockerfile` building. ### Why are the changes needed? `Apache Spark` release docker image compilation has been broken for last 7 days due to the SparkR package compilation. - https://github.com/apache/spark/actions/workflows/release.yml - https://github.com/apache/spark/actions/runs/17425825244 ``` #11 559.4 No package 'libwebpmux' found ... #11 559.4 -------------------------- [ERROR MESSAGE] --------------------------- #11 559.4 <stdin>:1:10: fatal error: ft2build.h: No such file or directory #11 559.4 compilation terminated. #11 559.4 -------------------------------------------------------------------- #11 559.4 ERROR: configuration failed for package 'ragg' ``` ### Does this PR introduce _any_ user-facing change? No, this is a fix for Apache Spark release tool. ### How was this patch tested? Manually build. ``` $ cd dev/create-release/spark-rm $ docker build . ``` **BEFORE** ``` ... Dockerfile:83 -------------------- 82 | # See more in SPARK-39959, roxygen2 < 7.2.1 83 | >>> RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \ 84 | >>> 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', \ 85 | >>> 'ggplot2', 'mvtnorm', 'statmod', 'xml2'), repos='https://cloud.r-project.org/')" && \ 86 | >>> Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='https://cloud.r-project.org')" && \ 87 | >>> Rscript -e "devtools::install_version('lintr', version='2.0.1', repos='https://cloud.r-project.org')" && \ 88 | >>> Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')" && \ 89 | >>> Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')" 90 | -------------------- ERROR: failed to build: failed to solve: ``` **AFTER** ``` ... => [ 6/22] RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' 3.8s => [ 7/22] RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 892.2s => [ 8/22] RUN add-apt-repository ppa:pypy/ppa 15.3s ... ``` After merging this PR, we can validate via the daily release dry-run CI. - https://github.com/apache/spark/actions/workflows/release.yml ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52290 from dongjoon-hyun/SPARK-53539. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…rsion at the end ### What changes were proposed in this pull request? This PR aims to fix `spark-rm` Dockefile to install `pkgdown` version at the end. ### Why are the changes needed? Although `pkgdown` is supposed to be `2.0.1`, it's changed at the next package installation like the following. We should install `pkgdown` at the end to make it sure. https://github.com/apache/spark/blob/0311f44e33e5cf8ba60ccc330de3df4f688f5847/dev/create-release/spark-rm/Dockerfile#L89 - https://github.com/apache/spark/actions/workflows/release.yml - https://github.com/apache/spark/actions/runs/19386198324/job/55473421715 ``` #11 1007.3 Downloading package from url: https://cloud.r-project.org/src/contrib/Archive/preferably/preferably_0.4.tar.gz #11 1008.9 pkgdown (2.0.1 -> 2.2.0) [CRAN] #11 1008.9 Installing 1 packages: pkgdown #11 1008.9 Installing package into '/usr/local/lib/R/site-library' #11 1008.9 (as 'lib' is unspecified) #11 1009.4 trying URL 'https://cloud.r-project.org/src/contrib/pkgdown_2.2.0.tar.gz' #11 1009.7 Content type 'application/x-gzip' length 1280630 bytes (1.2 MB) #11 1009.7 ================================================== #11 1009.7 downloaded 1.2 MB #11 1009.7 #11 1010.2 * installing *source* package 'pkgdown' ... #11 1010.2 ** package 'pkgdown' successfully unpacked and MD5 sums checked #11 1010.2 ** using staged installation #11 1010.3 ** R #11 1010.3 ** inst #11 1010.3 ** byte-compile and prepare package for lazy loading #11 1013.1 ** help #11 1013.2 *** installing help indices #11 1013.2 *** copying figures #11 1013.2 ** building package indices #11 1013.5 ** installing vignettes #11 1013.5 ** testing if installed package can be loaded from temporary location #11 1013.8 ** testing if installed package can be loaded from final location #11 1014.1 ** testing if installed package keeps a record of temporary installation path #11 1014.1 * DONE (pkgdown) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ``` $ dev/create-release/do-release-docker.sh -d /tmp/spark-4.1.0 -n -s docs $ docker run -it --rm --entrypoint /bin/bash spark-rm spark-rm923a388425fa:/opt/spark-rm/output$ Rscript -e 'installed.packages()' | grep pkgdown | head -n1 pkgdown "pkgdown" "/usr/local/lib/R/site-library" "2.0.1" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#53083 from dongjoon-hyun/SPARK-54371. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

pull bot added the ⤵️ pull label Sep 11, 2022

github-actions bot added the BUILD label Sep 11, 2022

pull bot merged commit 78d492c into wangyum:master Sep 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from apache:master #11

[pull] master from apache:master #11

Uh oh!

pull bot commented Sep 11, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[pull] master from apache:master #11

[pull] master from apache:master #11

Uh oh!

Conversation

pull bot commented Sep 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull bot commented Sep 11, 2022 •

edited

Loading