Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Sep 21, 2020

What changes were proposed in this pull request?

Allow to run the Spark web UI behind a reverse proxy with URLs prefixed by a context root, like www.mydomain.com/spark. In particular, this allows to access multiple Spark clusters through the same virtual host, only distinguishing them by context root, like www.mydomain.com/cluster1, www.mydomain.com/cluster2, and it allows to run the Spark UI in a common cookie domain (for SSO) with other services.

Why are the changes needed?

This PR is to take over #17455.
After changes, Spark allows showing customized prefix URL in all the href links of the HTML pages.

Does this PR introduce any user-facing change?

Yes, all the links of UI pages will be contains the value of spark.ui.reverseProxyUrl if it is configurated.

How was this patch tested?

New HTML Unit tests in MasterSuite
Manual UI testing for master, worker and app UI with an nginx proxy
Spark config:

spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx config:

server {
    listen 9000;
    set $SPARK_MASTER http://127.0.0.1:8080;
    # split spark UI path into prefix and local path within master UI
    location ~ ^(/path/to/spark/) {
        # strip prefix when forwarding request
        rewrite /path/to/spark(/.*) $1  break;
        #rewrite /path/to/spark/ "/" ;
        # forward to spark master UI
        proxy_pass $SPARK_MASTER;
        proxy_intercept_errors on;
        error_page 301 302 307 = @handle_redirects;
    }
    location @handle_redirects {
        set $saved_redirect_location '$upstream_http_location';
        proxy_pass $saved_redirect_location;
    }
}

@gengliangwang
Copy link
Member Author

This PR is to take over #17455.

@SparkQA
Copy link

SparkQA commented Sep 21, 2020

Test build #128938 has finished for PR 29820 at commit d5a17c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2020

Test build #128946 has finished for PR 29820 at commit 98687de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

the jira for this is resolved as won't fix, if we are working on it can you reopen the jira.

@gengliangwang
Copy link
Member Author

@tgravescs Thanks for the reminder.
This is still in progress and I don't think the current changes will work.
I found that the link for "/stages" will become "$prefix_url/proxy/$app_id/stages" after the changes. The reverse proxy should strip the "$prefxi_url" part and send the rest of the URL "proxy/$app_id/stages" to the driver, which won't be handled. The current changes only works for the workers/application links.

I will update the JIRA after investigation.

@gatorsmile
Copy link
Member

What is the status of this PR?

@SparkQA
Copy link

SparkQA commented Oct 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34652/

@SparkQA
Copy link

SparkQA commented Oct 20, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34652/

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Test build #130168 has finished for PR 29820 at commit ba05ca0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34775/

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34775/

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Test build #130274 has finished for PR 29820 at commit a38c1d4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34874/

@SparkQA
Copy link

SparkQA commented Oct 26, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34874/

@SparkQA
Copy link

SparkQA commented Oct 27, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34933/

@SparkQA
Copy link

SparkQA commented Oct 27, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34933/

@gengliangwang gengliangwang changed the title [WIP][SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url Oct 27, 2020
@gengliangwang
Copy link
Member Author

This is now ready for review.
cc @tgravescs @vanzin @ajbozarth @okoethibm

@SparkQA
Copy link

SparkQA commented Oct 27, 2020

Test build #130331 has finished for PR 29820 at commit 0d12249.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

cc @srowen @HeartSaVioR @cloud-fan as well

// if reverseProxyUrl is not set, then we continue to generate relative URLs
// starting with "/" throughout the UI and do not use activeMasterWebUiUrl
val proxyUrl = conf.get(UI_REVERSE_PROXY_URL.key, "").stripSuffix("/")
System.setProperty("spark.ui.proxyBase", proxyUrl)
Copy link
Contributor

@cloud-fan cloud-fan Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we mention that the /proxy/... will be added in UIUtils.makeHref?

@SparkQA
Copy link

SparkQA commented Nov 1, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35102/

@SparkQA
Copy link

SparkQA commented Nov 1, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35102/

@SparkQA
Copy link

SparkQA commented Nov 1, 2020

Test build #130498 has finished for PR 29820 at commit 1f25d23.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Merging to master

@drauschenbach
Copy link

I don't understand this statement:

This setting affects all the workers and application UIs running in the cluster and must be set identically on all the workers, drivers and masters.

In an HA Spark Standalone setup, there are always two Spark Masters, and config files are all static, because that is the model. Workers switch from one master to the other during a failover. So how can they be expected to contain a static reference to one or the other master in their static config?

I expose both Spark Masters in my Nginx config (/spark-master.0/, /spark-master.1/), and I'm struggling to figure out how to roll out this change.

@gengliangwang
Copy link
Member Author

@drauschenbach We can treat the prefix as the ID of one cluster. For example, let's say the prefix is "cluster1" and the reverse proxy is nginx, and both master 0 and master 1 are configurated with the prefix "/cluster1".
Before failover, "/cluster1" maps to IP of master0. Accessing master URL starting with "/cluster1" will redirect to master 0. All the shown worker UI are with prefix as well, so that accessing worker URL will go to nginx first, and then master 0, and then the exact worker.
After failover, I think you can try just map "/cluster1" to the IP of master1 in nginx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants