Skip to content

Conversation

@okoethibm
Copy link
Contributor

What changes were proposed in this pull request?

Allow to run the Spark web UI behind a reverse proxy with URLs prefixed by a context root, like www.mydomain.com/spark. In particular, this allows to access multiple Spark clusters through the same virtual host, only distinguishing them by context root, like www.mydomain.com/cluster1, www.mydomain.com/cluster2, and it allows to run the Spark UI in a common cookie domain (for SSO) with other services.

How was this patch tested?

New HTML Unit tests in MasterSuite
Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
nginx config:
server {
listen 9000;
set $SPARK_MASTER http://:8080;
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}
location ~ ^(?/path/to/spark)(?<local_path>/.*) {
rewrite ^ $local_path break;
proxy_pass $SPARK_MASTER;
proxy_redirect $SPARK_MASTER $prefix;
}
}

Allows to have a path prefix in the setting, like
http://mydomain.com/path/to/spark and generate all WebUI URLs correctly
with that prefix
@ajbozarth
Copy link
Member

I'm still taking some time looking at this, but while I do, @vanzin and @tgravescs you two reviewed the original reverse proxy pr, could you take a look? Also @gurvindersingh want to take a look? You wrote the original code.

Copy link
Member

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple comments, but I'd prefer if someone else looked at the tests and docs to make sure they're correct. Overall I like the idea of this PR but it feels rough around the edges. The main issue I noticed was the constant use of stripSuffix I'd like to think there's a better way to handle the configs that would not require it, but I can't think of how myself.

System.setProperty("spark.ui.proxyBase", proxyUrlNoSlash)
// If the master URL has a path component, it must end with a slash.
// Otherwise the browser generates incorrect relative links
masterWebUiUrl = proxyUrlNoSlash + "/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section of code seems like more work than necessary. Can't you just set masterWebUiUrl = reverseProxyUrl here and then move reverseProxyUrl.stripSuffix("/") into System.setProperty making this a two line function?

Copy link
Contributor Author

@okoethibm okoethibm Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a front-end reverse proxy path like mydomain.com:80/path/to/spark, then the spark.ui.proxyBase property (prefix for URL generation) must not include a trailing slash, the way it's used in UiUtils, like prependBaseUri("/static/bootstrap.min.css").
However, the explicit URL address pointing to the master UI page (e.g. the back-lilnk from workers to master, which masterWebUiUrl feeds into) must include a trailing slash, if it has a path component, because the master UI page contains relative liks like "app?...".
Without a path component in the master URL (mydomain.com:8080), the trailing slash did not matter for resolving these links, but with a path component, they must resolve to mydomain.com:80/path/to/spark/app (not mydomain.com:80/path/to/app), therefore the base URL must have a trailing slash.

The code is intended to work regardless whether spark.ui.reverseProxyUrl was specified with or without a trailing slash, so the safe way to ensure a single trailing slash was to first strip an optional slash and then add one. Your suggestion would omit the slash if there is none specified in the config.
If there's a clean way to move the stripSuffix handling into the config itself, that would make the code prettier, though

val baseUrl =
if (conf.getBoolean("spark.ui.reverseProxy", false)) {
s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
// TODO get from master?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this TODO for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that was a leftover from testing. In fact, the code is simpler when consistently get the reverse proxy URL from the config along with the reverse proxy flag, requiring both settings to be consistently set on all nodes. I briefly considered a communication extension to send the master (reverse proxy) URL to the executors, but felt it didn't really help

@holdenk
Copy link
Contributor

holdenk commented Apr 5, 2017

Jenkins OK to test.

@okoethibm
Copy link
Contributor Author

@ajbozarth Any other comments on this PR? Why is it not testing even though it has an "ok to test"?

@ajbozarth
Copy link
Member

It seems it didn't take @holdenk ok, @vanzin mind okaying this to test?

@BartekH
Copy link
Contributor

BartekH commented May 19, 2017

Any update? It's very urgent case.

@SparkQA
Copy link

SparkQA commented May 19, 2017

Test build #3730 has finished for PR 17455 at commit af314fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@agsimeonov
Copy link

Any chance this can get into v2.2.0? I am using Spark in Docker with nginx reverse proxy and this would help immensely.

@ajbozarth
Copy link
Member

Usually new features aren't added to branches after they've been cut, and almost never after an rc has been cut. So given branch-2.2 was cut over a month ago and we're about to cut rc3, I'm pretty sure this will not make it into 2.2.0 sorry.

@jiangxb1987
Copy link
Contributor

retest this please

@jiangxb1987
Copy link
Contributor

ok to test

@ajbozarth
Copy link
Member

@vanzin I noticed this stalled after I reviewed it, mind taking a look?

@sayeaud
Copy link

sayeaud commented Aug 9, 2017

Would love to get this in for the next release, currently I'm using a variant of https://github.com/aseigneurin/spark-ui-proxy to work around this issue...

@ajbozarth
Copy link
Member

#18499 was just merged, you'll need to update this to match the new code if you hope to get reinvigorate interest

@andrelu
Copy link

andrelu commented Aug 30, 2017

Hello, any update on this issue? is it going to be available on versions 2.1.x?

Tks

@ajbozarth
Copy link
Member

Before this can move forward it needs to be updated, but I can tell you now that since this is a new feature and not a bug fix this will never make it into 2.1.x or 2.2.x, the earliest you'll see it is 2.3.0

@jiangxb1987
Copy link
Contributor

@okoethibm Have you got any time to rebase this PR to the latest master?

@okoethibm
Copy link
Contributor Author

@jiangxb1987 I've stopped working on Spark related efforts a while ago and will not be following up on this PR any more. Anyone interested is free to take over.

@jiangxb1987
Copy link
Contributor

I'm going to close this PR because it goes stale, please feel free to reopen it or open another PR if anyone have more thoughts on this issue.

@asfgit asfgit closed this in ed1478c Nov 7, 2017
@elgalu
Copy link

elgalu commented Oct 16, 2018

Such a pity this was PR was abandoned, I've just spend like 2 hours hacking nginx to get over this problem...

@ajbozarth
Copy link
Member

@elgalu feel free to try picking it back up, I was a fan of it when reviewing it "back in the day"

@ms-lolo
Copy link

ms-lolo commented May 22, 2020

So I'm definitely confused by the spark docs everywhere… is this still not possible? I have a single domain with many spark clusters sitting behind it and I would like to reach them using a path like /spark/{some-id}. There are multiple settings in the config docs mentioning proxies but they just cause chaos and make no sense to me. Can anyone point me in the right direction or simply tell me this isn't possible? So far spark.ui.proxyBase seems like the closest thing to what I want but this setting isn't even in the docs even though it's mentioned as part of other configs (including brand new configs making their way into 3.0!)

gengliangwang added a commit that referenced this pull request Nov 1, 2020
…ng a path prefix Revert proxy url

### What changes were proposed in this pull request?

Allow to run the Spark web UI behind a reverse proxy with URLs prefixed by a context root, like www.mydomain.com/spark. In particular, this allows to access multiple Spark clusters through the same virtual host, only distinguishing them by context root, like www.mydomain.com/cluster1, www.mydomain.com/cluster2, and it allows to run the Spark UI in a common cookie domain (for SSO) with other services.

### Why are the changes needed?

This PR is to take over #17455.
After changes, Spark allows showing customized prefix URL in all the `href` links of the HTML pages.

### Does this PR introduce _any_ user-facing change?

Yes, all the links of UI pages will be contains the value of `spark.ui.reverseProxyUrl` if it is configurated.
### How was this patch tested?

New HTML Unit tests in MasterSuite
Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:
```
spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
```
nginx config:
```
server {
    listen 9000;
    set $SPARK_MASTER http://127.0.0.1:8080;
    # split spark UI path into prefix and local path within master UI
    location ~ ^(/path/to/spark/) {
        # strip prefix when forwarding request
        rewrite /path/to/spark(/.*) $1  break;
        #rewrite /path/to/spark/ "/" ;
        # forward to spark master UI
        proxy_pass $SPARK_MASTER;
        proxy_intercept_errors on;
        error_page 301 302 307 = handle_redirects;
    }
    location handle_redirects {
        set $saved_redirect_location '$upstream_http_location';
        proxy_pass $saved_redirect_location;
    }
}
```

Closes #29820 from gengliangwang/revertProxyURL.

Lead-authored-by: Gengliang Wang <[email protected]>
Co-authored-by: Oliver Köth <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.