Skip to content

Conversation

@philwalk
Copy link
Contributor

@philwalk philwalk commented Nov 22, 2024

What changes were proposed in this pull request?

The last action in bin/load-spark-env.sh performs a test to determine whether running in a terminal or not, and whether stdin is reading from a pipe. A more portable test is needed.

Why are the changes needed?

The current approach relies on ps with options that vary significantly between different Unix-like systems. Specifically, it prints an error message in both cygwin and msys2 (and by extension, in all of the variations of git-for-windows). It doesn't print an error message, but fails to detect a terminal session in Linux and Osx/Darwin homebrew (always thinks STDIN is a pipe).

Here's what the problem looks like in a cygwin64 session (with set -x just ahead of the section of interest):

If called directly:

$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'

Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:

$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]

Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:

# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]

In #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024, the same failure occurs (it doesn't recognize terminal environments).

Does this PR introduce any user-facing change?

This is a proposed bug fix, and, other than fixing the bug, should be invisible to users.

How was this patch tested?

The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.


- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.

#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi

The output of the manual test in all 5 tested environments.

philwalk@quadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#

Was this patch authored or co-authored using generative AI tooling?

No

@philwalk
Copy link
Contributor Author

Not sure what "workflow run detection" is, or how to correct it.
The output of git remote -vv is:

# git remote -vv
origin  https://github.com/philwalk/spark.git (fetch)
origin  https://github.com/philwalk/spark.git (push)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @philwalk .

Do you enable GitHub Action in your repository?

There is a community documentation about Pull Request. Please check the section, Pull Request.

@philwalk
Copy link
Contributor Author

I enabled Build and Test on my forked repo.

@HyukjinKwon
Copy link
Member

Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html

@pan3793
Copy link
Member

pan3793 commented Nov 25, 2024

The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community?

@philwalk
Copy link
Contributor Author

Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html
I requested a JIRA account, not yet created ...

@dongjoon-hyun
Copy link
Member

Got it. I approved it a few minutes ago, @philwalk .

Screenshot 2024-11-25 at 09 39 33

@philwalk
Copy link
Contributor Author

jira bug report

@philwalk
Copy link
Contributor Author

philwalk commented Nov 25, 2024

The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community?
Not at this time ..
It seems that hive needs work to be compatible with cygwin or msys2 bash shell environments. There are a number of problems and I'm not able to work on it at this time.

By contrast, spark-shell starts right up (after applying this PR):

$ bin/spark-shell
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-preview2
      /_/

Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 21.0.4)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context Web UI available at http://d5:4040
Spark context available as 'sc' (master = local[*], app id = local-1732575351925).
Spark session available as 'spark'.

scala>

@HyukjinKwon HyukjinKwon changed the title A more portable terminal / pipe test needed for bin/load-spark-env.sh [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh Nov 26, 2024
@philwalk
Copy link
Contributor Author

philwalk commented Dec 5, 2024

Is this waiting on me to do something?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 16, 2025
@github-actions github-actions bot closed this Mar 17, 2025
@LuciferYang LuciferYang reopened this Mar 17, 2025
@LuciferYang LuciferYang removed the Stale label Mar 17, 2025
Copy link
Contributor Author

@philwalk philwalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@LuciferYang
Copy link
Contributor

@dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look?

@philwalk
Copy link
Contributor Author

philwalk commented Mar 18, 2025

@dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look?

I re-tested the modified line in MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys and the new version appears to be functionally identical, and the changes appear to be recommended style.

@LuciferYang
Copy link
Contributor

LuciferYang commented Mar 18, 2025

I re-tested the modified line in MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys and the new version appears to be functionally identical, and the changes appear to be recommended style.

Thank you @philwalk , let's wait one more day. I will merge this one if there are no other new comments

LuciferYang pushed a commit that referenced this pull request Mar 20, 2025
…in/load-spark-env.sh

### What changes were proposed in this pull request?
The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe.   A more portable test is needed.

### Why are the changes needed?
The current approach relies on `ps` with options that vary significantly between different Unix-like systems.  Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`).   It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe).

Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest):

If called directly:
```bash
$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'
```
Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:
```bash
$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
```
Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:
```bash
# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]
```
In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments).

### Does this PR introduce _any_ user-facing change?
This is a proposed bug fix, and, other than fixing the bug,  should be invisible to users.

### How was this patch tested?
The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.
```

- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

```
The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.
```bash
#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi
```
The output of the manual test in all 5 tested environments.
```
philwalkquadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48937 from philwalk/portability-fix-for-load-spark-env.sh.

Authored-by: philwalk <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit 8d26008)
Signed-off-by: yangjie01 <[email protected]>
LuciferYang pushed a commit that referenced this pull request Mar 20, 2025
…in/load-spark-env.sh

### What changes were proposed in this pull request?
The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe.   A more portable test is needed.

### Why are the changes needed?
The current approach relies on `ps` with options that vary significantly between different Unix-like systems.  Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`).   It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe).

Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest):

If called directly:
```bash
$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'
```
Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:
```bash
$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
```
Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:
```bash
# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]
```
In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments).

### Does this PR introduce _any_ user-facing change?
This is a proposed bug fix, and, other than fixing the bug,  should be invisible to users.

### How was this patch tested?
The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.
```

- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

```
The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.
```bash
#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi
```
The output of the manual test in all 5 tested environments.
```
philwalkquadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48937 from philwalk/portability-fix-for-load-spark-env.sh.

Authored-by: philwalk <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit 8d26008)
Signed-off-by: yangjie01 <[email protected]>
@LuciferYang
Copy link
Contributor

Merged into master/branch-4.0/branch-3.5. Thanks @philwalk @dongjoon-hyun @HyukjinKwon @pan3793

jetoile pushed a commit to criteo-forks/spark that referenced this pull request Jul 15, 2025
…in/load-spark-env.sh

### What changes were proposed in this pull request?
The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe.   A more portable test is needed.

### Why are the changes needed?
The current approach relies on `ps` with options that vary significantly between different Unix-like systems.  Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`).   It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe).

Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest):

If called directly:
```bash
$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'
```
Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:
```bash
$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
```
Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:
```bash
# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]
```
In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments).

### Does this PR introduce _any_ user-facing change?
This is a proposed bug fix, and, other than fixing the bug,  should be invisible to users.

### How was this patch tested?
The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.
```

- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

```
The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.
```bash
#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi
```
The output of the manual test in all 5 tested environments.
```
philwalkquadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48937 from philwalk/portability-fix-for-load-spark-env.sh.

Authored-by: philwalk <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit 8d26008)
Signed-off-by: yangjie01 <[email protected]>
@pan3793
Copy link
Member

pan3793 commented Aug 5, 2025

@philwalk this causes behavior change.

Write a test.sh with content

#!/usr/bin/env bash
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
  echo 'before: hit'
fi

if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo 'SPARK-50416: hit'
fi
$ ./test.sh
SPARK-50416: hit

$ ./test.sh &
[1] 1828302
before: hit
SPARK-50416: hit

@pan3793
Copy link
Member

pan3793 commented Aug 5, 2025

The last action in bin/load-spark-env.sh performs a test to determine whether running in a terminal or not, and whether stdin is reading from a pipe.

@philwalk The original intention here is to check whether the process is running in the background, you can check SPARK-8731 for more details.

@philwalk
Copy link
Contributor Author

philwalk commented Aug 5, 2025

@pan3793

(ps -o stat= -p $$)

You raise a valid point, the two tests aren't equivalent. However, ps -o stat isn't portable.
AFAICT, there is no portable way to detect whether running in the background. We could preserve current semantics where ps -o stat= is supported, and avoid printing error messages otherwise. But my impression is that what is needed is to know whether we have a terminal to write to or not.

philwalk@d5 MSYS /opt/ue
# ./test.sh
ps: unknown option -- o
Try `ps --help' for more information.
before: hit
SPARK-50416: hit

philwalk@d5 MSYS /opt/ue
# ./test.sh &
[4] 119467

philwalk@d5 MSYS /opt/ue
# ps: unknown option -- o
Try `ps --help' for more information.
before: hit
SPARK-50416: hit

BTW, a less verbose and more portable way to determine if we have a writeable terminal might be "if [ -t 0 ]; then".
[4]+ Done ./test.sh

@philwalk
Copy link
Contributor Author

philwalk commented Aug 5, 2025 via email

@philwalk
Copy link
Contributor Author

philwalk commented Aug 5, 2025

Here's a version that preserves the original semantics:

if [[ ( ! $(ps -o stat= -p $$ 2>/dev/null) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
  echo 'before: hit'
fi

@pan3793
Copy link
Member

pan3793 commented Aug 6, 2025

I'm not an expert in this area, the GPT suggested version is

if [[ ! $(ps -o stat= -p $$) == *"+"* ]] && [[ ! -p /dev/stdin ]]; then
  ...
fi

@philwalk, please send a new PR to fix it, and as the current change is already released in 3.5.6 and 4.0.0, you need create a new JIRA ticket ahead

@philwalk
Copy link
Contributor Author

philwalk commented Aug 6, 2025

The new PR is 51879
and the JIRA ticket is SPARK-53149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants