-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[Spark-20044][Web UI] Support Spark UI behind front-end reverse proxy using a path prefix #17455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Allows to have a path prefix in the setting, like http://mydomain.com/path/to/spark and generate all WebUI URLs correctly with that prefix
|
I'm still taking some time looking at this, but while I do, @vanzin and @tgravescs you two reviewed the original reverse proxy pr, could you take a look? Also @gurvindersingh want to take a look? You wrote the original code. |
ajbozarth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a couple comments, but I'd prefer if someone else looked at the tests and docs to make sure they're correct. Overall I like the idea of this PR but it feels rough around the edges. The main issue I noticed was the constant use of stripSuffix I'd like to think there's a better way to handle the configs that would not require it, but I can't think of how myself.
| System.setProperty("spark.ui.proxyBase", proxyUrlNoSlash) | ||
| // If the master URL has a path component, it must end with a slash. | ||
| // Otherwise the browser generates incorrect relative links | ||
| masterWebUiUrl = proxyUrlNoSlash + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section of code seems like more work than necessary. Can't you just set masterWebUiUrl = reverseProxyUrl here and then move reverseProxyUrl.stripSuffix("/") into System.setProperty making this a two line function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a front-end reverse proxy path like mydomain.com:80/path/to/spark, then the spark.ui.proxyBase property (prefix for URL generation) must not include a trailing slash, the way it's used in UiUtils, like prependBaseUri("/static/bootstrap.min.css").
However, the explicit URL address pointing to the master UI page (e.g. the back-lilnk from workers to master, which masterWebUiUrl feeds into) must include a trailing slash, if it has a path component, because the master UI page contains relative liks like "app?...".
Without a path component in the master URL (mydomain.com:8080), the trailing slash did not matter for resolving these links, but with a path component, they must resolve to mydomain.com:80/path/to/spark/app (not mydomain.com:80/path/to/app), therefore the base URL must have a trailing slash.
The code is intended to work regardless whether spark.ui.reverseProxyUrl was specified with or without a trailing slash, so the safe way to ensure a single trailing slash was to first strip an optional slash and then add one. Your suggestion would omit the slash if there is none specified in the config.
If there's a clean way to move the stripSuffix handling into the config itself, that would make the code prettier, though
| val baseUrl = | ||
| if (conf.getBoolean("spark.ui.reverseProxy", false)) { | ||
| s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType=" | ||
| // TODO get from master? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this TODO for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that was a leftover from testing. In fact, the code is simpler when consistently get the reverse proxy URL from the config along with the reverse proxy flag, requiring both settings to be consistently set on all nodes. I briefly considered a communication extension to send the master (reverse proxy) URL to the executors, but felt it didn't really help
|
Jenkins OK to test. |
|
@ajbozarth Any other comments on this PR? Why is it not testing even though it has an "ok to test"? |
|
Any update? It's very urgent case. |
|
Test build #3730 has finished for PR 17455 at commit
|
|
Any chance this can get into v2.2.0? I am using Spark in Docker with nginx reverse proxy and this would help immensely. |
|
Usually new features aren't added to branches after they've been cut, and almost never after an rc has been cut. So given branch-2.2 was cut over a month ago and we're about to cut rc3, I'm pretty sure this will not make it into 2.2.0 sorry. |
|
retest this please |
|
ok to test |
|
@vanzin I noticed this stalled after I reviewed it, mind taking a look? |
|
Would love to get this in for the next release, currently I'm using a variant of https://github.com/aseigneurin/spark-ui-proxy to work around this issue... |
|
#18499 was just merged, you'll need to update this to match the new code if you hope to get reinvigorate interest |
|
Hello, any update on this issue? is it going to be available on versions 2.1.x? Tks |
|
Before this can move forward it needs to be updated, but I can tell you now that since this is a new feature and not a bug fix this will never make it into 2.1.x or 2.2.x, the earliest you'll see it is 2.3.0 |
|
@okoethibm Have you got any time to rebase this PR to the latest master? |
|
@jiangxb1987 I've stopped working on Spark related efforts a while ago and will not be following up on this PR any more. Anyone interested is free to take over. |
|
I'm going to close this PR because it goes stale, please feel free to reopen it or open another PR if anyone have more thoughts on this issue. |
|
Such a pity this was PR was abandoned, I've just spend like 2 hours hacking nginx to get over this problem... |
|
@elgalu feel free to try picking it back up, I was a fan of it when reviewing it "back in the day" |
|
So I'm definitely confused by the spark docs everywhere… is this still not possible? I have a single domain with many spark clusters sitting behind it and I would like to reach them using a path like |
…ng a path prefix Revert proxy url ### What changes were proposed in this pull request? Allow to run the Spark web UI behind a reverse proxy with URLs prefixed by a context root, like www.mydomain.com/spark. In particular, this allows to access multiple Spark clusters through the same virtual host, only distinguishing them by context root, like www.mydomain.com/cluster1, www.mydomain.com/cluster2, and it allows to run the Spark UI in a common cookie domain (for SSO) with other services. ### Why are the changes needed? This PR is to take over #17455. After changes, Spark allows showing customized prefix URL in all the `href` links of the HTML pages. ### Does this PR introduce _any_ user-facing change? Yes, all the links of UI pages will be contains the value of `spark.ui.reverseProxyUrl` if it is configurated. ### How was this patch tested? New HTML Unit tests in MasterSuite Manual UI testing for master, worker and app UI with an nginx proxy Spark config: ``` spark.ui.port 8080 spark.ui.reverseProxy=true spark.ui.reverseProxyUrl=/path/to/spark/ ``` nginx config: ``` server { listen 9000; set $SPARK_MASTER http://127.0.0.1:8080; # split spark UI path into prefix and local path within master UI location ~ ^(/path/to/spark/) { # strip prefix when forwarding request rewrite /path/to/spark(/.*) $1 break; #rewrite /path/to/spark/ "/" ; # forward to spark master UI proxy_pass $SPARK_MASTER; proxy_intercept_errors on; error_page 301 302 307 = handle_redirects; } location handle_redirects { set $saved_redirect_location '$upstream_http_location'; proxy_pass $saved_redirect_location; } } ``` Closes #29820 from gengliangwang/revertProxyURL. Lead-authored-by: Gengliang Wang <[email protected]> Co-authored-by: Oliver Köth <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
Allow to run the Spark web UI behind a reverse proxy with URLs prefixed by a context root, like www.mydomain.com/spark. In particular, this allows to access multiple Spark clusters through the same virtual host, only distinguishing them by context root, like www.mydomain.com/cluster1, www.mydomain.com/cluster2, and it allows to run the Spark UI in a common cookie domain (for SSO) with other services.
How was this patch tested?
New HTML Unit tests in MasterSuite
Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
nginx config:
server {
listen 9000;
set $SPARK_MASTER http://:8080;
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}
location ~ ^(?/path/to/spark)(?<local_path>/.*) {
rewrite ^ $local_path break;
proxy_pass $SPARK_MASTER;
proxy_redirect $SPARK_MASTER $prefix;
}
}