-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect server #39928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5ddcf4a to
1e6327a
Compare
1e6327a to
ec9b71e
Compare
...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
Show resolved
Hide resolved
.../connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectServer.scala
Outdated
Show resolved
Hide resolved
.../connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectServer.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
Maybe Currently all scripts under |
|
sure sgtm |
|
Merged to master and branch-3.4. Actually CI doesn't verify none of the changes here (except the linter job that passed now). |
…rver ### What changes were proposed in this pull request? This PR proposes to scripts to start and stop the Spark Connect server. ### Why are the changes needed? Currently, there is no proper way to start and stop the Spark Connect server. Now it requires you to start it with, for example, a Spark shell: ```bash # For development, ./bin/spark-shell \ --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` \ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin ``` ```bash # For released Spark versions ./bin/spark-shell \ --packages org.apache.spark:spark-connect_2.12:3.4.0 \ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin ``` which is awkward. ### Does this PR introduce _any_ user-facing change? Yes, it adds new scripts to start and stop Spark Connect server. ### How was this patch tested? Manually tested: ```bash # For released Spark versions, #`sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0` sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` ``` ```bash bin/pyspark --remote sc://localhost:15002 ... ``` ```bash sbin/stop-connect-server.sh ps -fe | grep Spark ``` Closes #39928 from HyukjinKwon/exec-script. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 1126031) Signed-off-by: Hyukjin Kwon <[email protected]>
|
it feels a bit odd to me to have a script in the sbin that user has to manually specify the location of the connect jars. What is the plan for distribution, are connect jars going to be in the main distribution jars/ directory so it won't be needed at that point? |
Yes, the eventual plan is to move the whole connect jars to the main distribution jars/ around the next release (Apache Spark 3.5.0). For now, the Spark connect project is separated into the external project. It is located in For a bit of more context, there are a couple of plans such as replacing Py4J to Spark Connect (so we can block arbitrary JVM access from Python side for security purpose), and I personally am thinking about replacing Thrift server in the far future (and don't use Hive's Thrift server). There are also some more context in this blog post that I like: https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html I plan to send another email to explain the whole context in the dev mailing list around .. right after the Spark 3.4 release. |
…rver ### What changes were proposed in this pull request? This PR proposes to scripts to start and stop the Spark Connect server. ### Why are the changes needed? Currently, there is no proper way to start and stop the Spark Connect server. Now it requires you to start it with, for example, a Spark shell: ```bash # For development, ./bin/spark-shell \ --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` \ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin ``` ```bash # For released Spark versions ./bin/spark-shell \ --packages org.apache.spark:spark-connect_2.12:3.4.0 \ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin ``` which is awkward. ### Does this PR introduce _any_ user-facing change? Yes, it adds new scripts to start and stop Spark Connect server. ### How was this patch tested? Manually tested: ```bash # For released Spark versions, #`sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0` sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` ``` ```bash bin/pyspark --remote sc://localhost:15002 ... ``` ```bash sbin/stop-connect-server.sh ps -fe | grep Spark ``` Closes apache#39928 from HyukjinKwon/exec-script. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 1126031) Signed-off-by: Hyukjin Kwon <[email protected]>
…in module ### What changes were proposed in this pull request? This PR proposes to move the connect server to builtin module. From: ``` connector/connect/server connector/connect/common ``` To: ``` connect/server connect/common ``` ### Why are the changes needed? So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in #39928 (comment). ### Does this PR introduce _any_ user-facing change? Yes, users don't have to specify `--packages` anymore. ### How was this patch tested? CI in this PR should verify them. Also manually tested several basic commands such as: - Maven build - SBT build - Running basic Scala client commands ```bash cd connector/connect bin/spark-connect bin/spark-connect-scala-client ``` - Running basic PySpark client commands ```bash bin/pyspark --remote local ``` - Connecting to the server launched by `./sbin/start-connect-server.sh` ```bash ./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47157 from HyukjinKwon/move-connect-server-builtin. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…in module ### What changes were proposed in this pull request? This PR proposes to move the connect server to builtin module. From: ``` connector/connect/server connector/connect/common ``` To: ``` connect/server connect/common ``` ### Why are the changes needed? So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in apache#39928 (comment). ### Does this PR introduce _any_ user-facing change? Yes, users don't have to specify `--packages` anymore. ### How was this patch tested? CI in this PR should verify them. Also manually tested several basic commands such as: - Maven build - SBT build - Running basic Scala client commands ```bash cd connector/connect bin/spark-connect bin/spark-connect-scala-client ``` - Running basic PySpark client commands ```bash bin/pyspark --remote local ``` - Connecting to the server launched by `./sbin/start-connect-server.sh` ```bash ./sbin/start-connect-server.sh bin/pyspark --remote "sc://localhost" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47157 from HyukjinKwon/move-connect-server-builtin. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR proposes to scripts to start and stop the Spark Connect server.
Why are the changes needed?
Currently, there is no proper way to start and stop the Spark Connect server. Now it requires you to start it with, for example, a Spark shell:
# For released Spark versions ./bin/spark-shell \ --packages org.apache.spark:spark-connect_2.12:3.4.0 \ --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPluginwhich is awkward.
Does this PR introduce any user-facing change?
Yes, it adds new scripts to start and stop Spark Connect server.
How was this patch tested?
Manually tested:
sbin/stop-connect-server.sh ps -fe | grep Spark