Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Jun 28, 2015

When there are massive tasks in the stage page (such as, running sc.parallelize(1 to 100000, 10000).count()), the size of the stage page is large. Enabling GZip can reduce the size significantly. For example, in my environment, the size of a stage page with 10000 tasks reduces from 9.8MB to 138KB.

I also added a configuration so that if the user wants to use an alternative Filter, he can turn off it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely Jetty/netty already handle this internally? this is a hacky approach to supporting compression and I am all but certain modern containers (can be configured to) do this for you. Tomcat does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we don't have a configuration file for Jetty. So to enable it, we need to set it in the code. Here is an example provided by Jetty: https://github.com/jetty-project/jetty-plugin-support/blob/378f5f691fc24c3f223e7239fc56b3568b6f816e/jetty-servlets/src/test/java/org/eclipse/jetty/servlets/GzipWithPipeliningTest.java#L58

@SparkQA
Copy link

SparkQA commented Jun 28, 2015

Test build #35942 has finished for PR 7072 at commit b01c9ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pwendell
Copy link
Contributor

@zsxwing have you noticed any improvement in user-facing response time for the loading of the page?

@zsxwing
Copy link
Member Author

zsxwing commented Jun 29, 2015

@zsxwing have you noticed any improvement in user-facing response time for the loading of the page?

I tested in my local machine, the time of downloading the page only reduces about 300 milliseconds in this case. However, I think if the driver is in a remote node, the improvement will be significant.

BTW, #7071 has much more improvement.

@srowen
Copy link
Member

srowen commented Jun 29, 2015

Yes, I'm not sure this is worth the extra complexity, dependency, config param. I can't imagine it makes much difference unless the page size is >1MB, and if the page is so big that this has any impact, the underlying problem should be fixed by pagination.

@andrewor14
Copy link
Contributor

There's an existing JIRA somewhere...

@andrewor14
Copy link
Contributor

@zsxwing is this the same as https://issues.apache.org/jira/browse/SPARK-7716? I did something similar in #6248 but didn't notice significant gains in my benchmark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to make this configurable? When would a user not want this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Spark supports the user to add custom Filters, I'm concerned that they may add a Filter that conflicts with GZipFilter. So I add this configuration to disable GZipFilter.

@pwendell
Copy link
Contributor

I would still argue for this change because in remote environments having a page with a 10MB payload is pretty bad. It's just good form to compress output for very large pages. There is a bit of extra complexity, but it doesn't seem too crazy since I think this is the standard Jetty mechanism for adding interceptors like this.

@pwendell
Copy link
Contributor

@srowen are you strongly against this or just mildly?

@zsxwing
Copy link
Member Author

zsxwing commented Jun 30, 2015

I think @andrewor14 's PR is much easier. I'm going to close this one.

@zsxwing zsxwing closed this Jun 30, 2015
@srowen
Copy link
Member

srowen commented Jun 30, 2015

Mildly, mostly because I was surprised it required a servlet filter in jetty. Ancient experience says the servlet filter approach doesn't entirely work unless it accounts quite carefully for the J2EE lifecycle of a request (e.g., can't compress after flushing starts) but may not be an issue here. If there's a simpler way, that's even better.

@zsxwing zsxwing deleted the gzip branch July 17, 2015 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants