-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8691][WebUI]Enable GZip by default for Web UI #7072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely Jetty/netty already handle this internally? this is a hacky approach to supporting compression and I am all but certain modern containers (can be configured to) do this for you. Tomcat does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is we don't have a configuration file for Jetty. So to enable it, we need to set it in the code. Here is an example provided by Jetty: https://github.com/jetty-project/jetty-plugin-support/blob/378f5f691fc24c3f223e7239fc56b3568b6f816e/jetty-servlets/src/test/java/org/eclipse/jetty/servlets/GzipWithPipeliningTest.java#L58
|
Test build #35942 has finished for PR 7072 at commit
|
|
@zsxwing have you noticed any improvement in user-facing response time for the loading of the page? |
I tested in my local machine, the time of downloading the page only reduces about 300 milliseconds in this case. However, I think if the driver is in a remote node, the improvement will be significant. BTW, #7071 has much more improvement. |
|
Yes, I'm not sure this is worth the extra complexity, dependency, config param. I can't imagine it makes much difference unless the page size is >1MB, and if the page is so big that this has any impact, the underlying problem should be fixed by pagination. |
|
There's an existing JIRA somewhere... |
|
@zsxwing is this the same as https://issues.apache.org/jira/browse/SPARK-7716? I did something similar in #6248 but didn't notice significant gains in my benchmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to make this configurable? When would a user not want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Spark supports the user to add custom Filters, I'm concerned that they may add a Filter that conflicts with GZipFilter. So I add this configuration to disable GZipFilter.
|
I would still argue for this change because in remote environments having a page with a 10MB payload is pretty bad. It's just good form to compress output for very large pages. There is a bit of extra complexity, but it doesn't seem too crazy since I think this is the standard Jetty mechanism for adding interceptors like this. |
|
@srowen are you strongly against this or just mildly? |
|
I think @andrewor14 's PR is much easier. I'm going to close this one. |
|
Mildly, mostly because I was surprised it required a servlet filter in jetty. Ancient experience says the servlet filter approach doesn't entirely work unless it accounts quite carefully for the J2EE lifecycle of a request (e.g., can't compress after flushing starts) but may not be an issue here. If there's a simpler way, that's even better. |
When there are massive tasks in the stage page (such as, running
sc.parallelize(1 to 100000, 10000).count()), the size of the stage page is large. Enabling GZip can reduce the size significantly. For example, in my environment, the size of a stage page with 10000 tasks reduces from 9.8MB to 138KB.I also added a configuration so that if the user wants to use an alternative
Filter, he can turn off it.