Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Feb 7, 2023

What changes were proposed in this pull request?

This PR proposes to scripts to start and stop the Spark Connect server.

Why are the changes needed?

Currently, there is no proper way to start and stop the Spark Connect server. Now it requires you to start it with, for example, a Spark shell:

# For development,
./bin/spark-shell \
   --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
# For released Spark versions
./bin/spark-shell \
  --packages org.apache.spark:spark-connect_2.12:3.4.0 \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin

which is awkward.

Does this PR introduce any user-facing change?

Yes, it adds new scripts to start and stop Spark Connect server.

How was this patch tested?

Manually tested:

# For released Spark versions,
#`sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0`
sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
bin/pyspark --remote sc://localhost:15002
...
sbin/stop-connect-server.sh
ps -fe | grep Spark

@HyukjinKwon HyukjinKwon marked this pull request as draft February 7, 2023 10:31
@HyukjinKwon HyukjinKwon force-pushed the exec-script branch 3 times, most recently from 5ddcf4a to 1e6327a Compare February 7, 2023 12:51
@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect server [SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect server Feb 7, 2023
@HyukjinKwon HyukjinKwon marked this pull request as ready for review February 7, 2023 12:51
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@pan3793
Copy link
Member

pan3793 commented Feb 8, 2023

Maybe sbin/start-connect-server.sh, sbin/stop-connect-server.sh?

Currently all scripts under sbin use - except to start-thriftserver.sh, stop-thriftserver.sh

@HyukjinKwon
Copy link
Member Author

sure sgtm

@HyukjinKwon
Copy link
Member Author

Merged to master and branch-3.4.

Actually CI doesn't verify none of the changes here (except the linter job that passed now).

HyukjinKwon added a commit that referenced this pull request Feb 8, 2023
…rver

### What changes were proposed in this pull request?

This PR proposes to scripts to start and stop the Spark Connect server.

### Why are the changes needed?

Currently, there is no proper way to start and stop the Spark Connect server. Now it requires you to start it with, for example, a Spark shell:

```bash
# For development,
./bin/spark-shell \
   --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
```

```bash
# For released Spark versions
./bin/spark-shell \
  --packages org.apache.spark:spark-connect_2.12:3.4.0 \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
```

which is awkward.

### Does this PR introduce _any_ user-facing change?

Yes, it adds new scripts to start and stop Spark Connect server.

### How was this patch tested?

Manually tested:

```bash
# For released Spark versions,
#`sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0`
sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
```

```bash
bin/pyspark --remote sc://localhost:15002
...
```

```bash
sbin/stop-connect-server.sh
ps -fe | grep Spark
```

Closes #39928 from HyukjinKwon/exec-script.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 1126031)
Signed-off-by: Hyukjin Kwon <[email protected]>
@tgravescs
Copy link
Contributor

tgravescs commented Feb 13, 2023

it feels a bit odd to me to have a script in the sbin that user has to manually specify the location of the connect jars. What is the plan for distribution, are connect jars going to be in the main distribution jars/ directory so it won't be needed at that point?
I don't think we do that with any of the other connectors but wasn't sure the plan and if it was mostly for initial development.
@HyukjinKwon

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Feb 13, 2023

it feels a bit odd to me to have a script in the sbin that user has to manually specify the location of the connect jars.

What is the plan for distribution, are connect jars going to be in the main distribution jars/ directory so it won't be needed at that point?

Yes, the eventual plan is to move the whole connect jars to the main distribution jars/ around the next release (Apache Spark 3.5.0). For now, the Spark connect project is separated into the external project. It is located in connector/ - that was external/ before (see also #35874, I think actually we might have to revisit the top level name here).

For a bit of more context, there are a couple of plans such as replacing Py4J to Spark Connect (so we can block arbitrary JVM access from Python side for security purpose), and I personally am thinking about replacing Thrift server in the far future (and don't use Hive's Thrift server). There are also some more context in this blog post that I like: https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html

I plan to send another email to explain the whole context in the dev mailing list around .. right after the Spark 3.4 release.

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…rver

### What changes were proposed in this pull request?

This PR proposes to scripts to start and stop the Spark Connect server.

### Why are the changes needed?

Currently, there is no proper way to start and stop the Spark Connect server. Now it requires you to start it with, for example, a Spark shell:

```bash
# For development,
./bin/spark-shell \
   --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
```

```bash
# For released Spark versions
./bin/spark-shell \
  --packages org.apache.spark:spark-connect_2.12:3.4.0 \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
```

which is awkward.

### Does this PR introduce _any_ user-facing change?

Yes, it adds new scripts to start and stop Spark Connect server.

### How was this patch tested?

Manually tested:

```bash
# For released Spark versions,
#`sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0`
sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
```

```bash
bin/pyspark --remote sc://localhost:15002
...
```

```bash
sbin/stop-connect-server.sh
ps -fe | grep Spark
```

Closes apache#39928 from HyukjinKwon/exec-script.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 1126031)
Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the exec-script branch January 15, 2024 00:48
HyukjinKwon added a commit that referenced this pull request Jul 11, 2024
…in module

### What changes were proposed in this pull request?

This PR proposes to move the connect server to builtin module.

From:

```
connector/connect/server
connector/connect/common
```

To:

```
connect/server
connect/common
```

### Why are the changes needed?

So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in #39928 (comment).

### Does this PR introduce _any_ user-facing change?

Yes, users don't have to specify `--packages` anymore.

### How was this patch tested?

CI in this PR should verify them.
Also manually tested several basic commands such as:

- Maven build
- SBT build
- Running basic Scala client commands
   ```bash
   cd connector/connect
   bin/spark-connect
   bin/spark-connect-scala-client
   ```
- Running basic PySpark client commands

   ```bash
   bin/pyspark --remote local
   ```
- Connecting to the server launched by `./sbin/start-connect-server.sh`

   ```bash
   ./sbin/start-connect-server.sh
    bin/pyspark --remote "sc://localhost"
   ```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47157 from HyukjinKwon/move-connect-server-builtin.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…in module

### What changes were proposed in this pull request?

This PR proposes to move the connect server to builtin module.

From:

```
connector/connect/server
connector/connect/common
```

To:

```
connect/server
connect/common
```

### Why are the changes needed?

So the end users do not have to specify `--packages` when they start the Spark Connect server. Spark Connect client remains as a separate module. This was also pointed out in apache#39928 (comment).

### Does this PR introduce _any_ user-facing change?

Yes, users don't have to specify `--packages` anymore.

### How was this patch tested?

CI in this PR should verify them.
Also manually tested several basic commands such as:

- Maven build
- SBT build
- Running basic Scala client commands
   ```bash
   cd connector/connect
   bin/spark-connect
   bin/spark-connect-scala-client
   ```
- Running basic PySpark client commands

   ```bash
   bin/pyspark --remote local
   ```
- Connecting to the server launched by `./sbin/start-connect-server.sh`

   ```bash
   ./sbin/start-connect-server.sh
    bin/pyspark --remote "sc://localhost"
   ```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47157 from HyukjinKwon/move-connect-server-builtin.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants