Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch adds Selenium tests for Spark's web UI. To avoid adding extra
dependencies to the test environment, the tests use Selenium's HtmlUnitDriver,
which is pure-Java, instead of, say, ChromeDriver.

I added new tests to try to reproduce a few UI bugs reported on JIRA, namely
SPARK-3021, SPARK-2105, and SPARK-2527. I wasn't able to reproduce these bugs;
I suspect that the older ones might have been fixed by other patches.

In order to use HtmlUnitDriver, I added an explicit dependency on the
org.apache.httpcomponents version of httpclient in order to prevent jets3t's
older version from taking precedence on the classpath.

I also upgraded ScalaTest to 2.2.1.

@SparkQA
Copy link

SparkQA commented Sep 20, 2014

QA tests have started for PR 2474 at commit fb2a0f9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 21, 2014

QA tests have finished for PR 2474 at commit fb2a0f9.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging

@pwendell
Copy link
Contributor

@JoshRosen how much time does this add to the test harness?

@JoshRosen
Copy link
Contributor Author

@pwendell According to Jenkins, UISuite took ~11 seconds to run, compared to ~2.9 seconds before this PR (although three of the tests were previously disabled).

@pwendell
Copy link
Contributor

Hmm - O(seconds) is a lot for new tests considering the quantity of tests we have. Is there a constant overhead in launching selenium that gets amortized over future tests we add here? If that's the case it seems fine.

@JoshRosen
Copy link
Contributor Author

If the only issue here is test speed, maybe we can disable the slower tests by default on Jenkins.

@JoshRosen
Copy link
Contributor Author

Note to self / reviewers: #2489 addresses another httpclient dependency issue and will probably conflict with this.

@aniketbhatnagar
Copy link
Contributor

Agreed that this can indeed cause conflicts. @JoshRosen, do you know which version of httpclient does this change effectively resolves to?

@JoshRosen
Copy link
Contributor Author

@aniketbhatnagar It looks like both of our patches replace commons.httpclient with httpcomponents. In this PR, I configured Spark Core to use version 4.3.5 of HttpComponents HttpClient because it seems to be the latest version.

One question, though: how backwards-compatible is the HttpComponents version of HttpClient? Do different versions provide different compatibility guarantees? I found a blog post suggesting that it's not fully compatible: http://blog.teamextension.com/migrating-from-httpclient-3-1-to-4-0-34.

Are there newer versions of the dependencies that depend on HttpClient (e.g. jets3t) which use the newer version?

@JoshRosen
Copy link
Contributor Author

It looks like we determine the jets3t version based on which Hadoop version we're using. Based on discussion in #629, it looks like we need to keep doing this rather than updating to 0.9.x across the board, since older Hadoop versions need the older version. Unfortunately, jets3t 0.7. depends on commons-httpclient:3.1. @srowen, do you have any thoughts here?

@srowen
Copy link
Member

srowen commented Sep 25, 2014

Yes, as I recall, Hadoop 1 + S3 requires jets3t 0.7 to work correctly. (Or else we would have also updated it to 0.9). I also believe that 3.x and 4.x of the HTTP libraries are fairly incompatible. Are they actually different artifacts though? Can you let jets3t use 3.x of the old artifact and still freely adjust what version of the 4.x artifact you use? this part I forget, whether they made brand new artifact names for 4.x.

@JoshRosen
Copy link
Contributor Author

Are they actually different artifacts though? Can you let jets3t use 3.x of the old artifact and still freely adjust what version of the 4.x artifact you use?

Hmm, they are different artifacts. Maybe HttpClient itself isn't the problem here. Here's the actual exception that I get if I undo my Maven changes:

[error] Uncaught exception when running org.apache.spark.ui.UISuite: java.lang.NoSuchMethodError: org.apache.http.impl.cookie.BrowserCompatSpecFactory.create(Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/cookie/CookieSpec;
sbt.ForkMain$ForkError: org.apache.http.impl.cookie.BrowserCompatSpecFactory.create(Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/cookie/CookieSpec;
    at com.gargoylesoftware.htmlunit.CookieManager.<init>(CookieManager.java:61)
    at com.gargoylesoftware.htmlunit.WebClient.<init>(WebClient.java:132)
    at org.openqa.selenium.htmlunit.HtmlUnitDriver.newWebClient(HtmlUnitDriver.java:288)
    at org.openqa.selenium.htmlunit.HtmlUnitDriver.createWebClient(HtmlUnitDriver.java:262)
    at org.openqa.selenium.htmlunit.HtmlUnitDriver.<init>(HtmlUnitDriver.java:147)
    at org.openqa.selenium.htmlunit.HtmlUnitDriver.<init>(HtmlUnitDriver.java:191)
    at org.openqa.selenium.htmlunit.HtmlUnitDriver.<init>(HtmlUnitDriver.java:187)
    at org.apache.spark.ui.UISuite.<init>(UISuite.scala:40)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at java.lang.Class.newInstance(Class.java:374)
    at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:641)
    at sbt.ForkMain$Run$2.call(ForkMain.java:294)
    at sbt.ForkMain$Run$2.call(ForkMain.java:284)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

@JoshRosen
Copy link
Contributor Author

When I wiped out my lib_managed/jars directory and ran sbt/sbt clean package followed by an interactive sbt session where I ran

project core
test-only org.apache.spark.ui.UISuite

then it looks like httpclient-4.1.3.jar somehow gets downloaded into lib_managed and ends up on the classpath during tests, even though nothing seems to depend on it, at least according to dependency-graph. This is puzzling.

@JoshRosen
Copy link
Contributor Author

It looks like this isn't an issue in the Maven build. Investigating to see if I can figure out how the older JAR is winding up on the SBT test classpath.

When I run sbt/sbt "show core/test:full-classpath",httpclient-4.1.3.jar doesn't show up in the list, but it definitely seems to be getting used at runtime...

@JoshRosen
Copy link
Contributor Author

Making a little more progress:

If I add

  import org.apache.http.impl.cookie.BrowserCompatSpecFactory
  System.getProperty("java.class.path").split(":").foreach(println)
  val f = new File(classOf[BrowserCompatSpecFactory].getProtectionDomain().getCodeSource().getLocation().getPath());
  println(f.getAbsolutePath)

to the top of UISuite, then it looks like only /Users/joshrosen/Documents/spark/lib_managed/jars/httpclient-4.3.2.jar appears on java.class.path while it appears that BrowserCompatSpecFactory is being loaded from /Users/joshrosen/Documents/spark/lib_managed/jars/httpclient-4.1.3.jar (according to that last print statement).

@JoshRosen
Copy link
Contributor Author

I'm going to give up and return to this later. Even if I look at the actual classloader that loads BrowserCompatSpecFactory using

 classOf[BrowserCompatSpecFactory].getClassLoader.asInstanceOf[URLClassLoader].getURLs().foreach(println)

I still don't see the outdated httpclient on the classpath, nor do I see the entire lib_managed/jars directory or any other wildcards that might explain why classes are being loaded from the old jar.

@JoshRosen
Copy link
Contributor Author

Actually, @ScrapCodes, do you have any ideas of what I should try here?

@srowen
Copy link
Member

srowen commented Sep 26, 2014

@JoshRosen It looks like HttpClient 4.1.3 comes in from Thrift via Hive:

mvn dependency:tree
...
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0-SNAPSHOT
...
[INFO] +- org.spark-project.hive:hive-serde:jar:0.12.0:compile
[INFO] |  +- org.spark-project.hive:hive-common:jar:0.12.0:compile
[INFO] |  |  +- org.spark-project.hive:hive-shims:jar:0.12.0:compile
[INFO] |  |  \- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  +- org.mockito:mockito-all:jar:1.9.0:test (version managed from 1.8.2; scope managed from compile)
[INFO] |  \- org.apache.thrift:libfb303:jar:0.9.0:compile
[INFO] |     \- org.apache.thrift:libthrift:jar:0.9.0:compile
[INFO] |        +- org.apache.httpcomponents:httpclient:jar:4.1.3:compile
[INFO] |        \- org.apache.httpcomponents:httpcore:jar:4.1.3:compile

... but that's weird if it has some effect on core tests, of course. This could be a red herring, but that's where 4.1.3 may be coming from, and then sticking around and getting put on some classpath.

This method went in late November 2012:
https://apache.googlesource.com/httpclient/+blame/ff1acb48a875b54319fed9da87d1942d4d0d9db3/httpclient/src/main/java/org/apache/http/impl/cookie/BrowserCompatSpecFactory.java

So would be in about version 4.2.3 or higher:
https://hc.apache.org/news.html

... which you probably already figured, that 4.1.3 (or some copy from earlier than about 4.2.3) must be the problem. (I think that the 3.x artifacts aren't an issue, because they're different artifacts.)

In your patch, you don't make httpclient a test scope dependency. I would try that as I know the transitive dep rules are weird in test scope. Maybe that actually forces the desired behavior. Try mvn dependency:tree as well to get its opinion.

@aniketbhatnagar
Copy link
Contributor

@JoshRosen adding to what mentioned, here is a gist of dependency tree before and after my patch would be applied - https://gist.github.com/aniketbhatnagar/b39e587d4f4a014870d1

As @srowen, httpclient seems to be coming from Thrift via Hive and what my patch attempts to do is force a newer (& compatible) version of the library in Kinesis-asl profile only.

Hope it helps.

@JoshRosen
Copy link
Contributor Author

@srowen

In your patch, you don't make httpclient a test scope dependency.

Actually, the code in this PR works fine (note that the SparkQA tests passed). The issue that I was referring to crops up when I remove the httpclient dependencies added by this PR and attempt to run the UISuite tests. In that case, it seems that the older JAR is somehow being used even though it doesn't appear on the classpath and doesn't appear in SBT's dependency-tree.

@JoshRosen
Copy link
Contributor Author

Rather than bumping httpclient's version across the board (since there might be binary compatibility issues), I'm going to try changing this PR to specify the newer version only in test-scope.

This patch adds Selenium tests for Spark's web UI.  To avoid adding extra
dependencies to the test environment, the tests use Selenium's HtmlUnitDriver,
which is pure-Java, instead of, say, ChromeDriver.

I added new tests to try to reproduce a few UI bugs reported on JIRA, namely
SPARK-3021, SPARK-2105, and SPARK-2527.  I wasn't able to reproduce these bugs;
I suspect that the older ones might have been fixed by other patches.

I also upgraded ScalaTest to 2.2.1.
@JoshRosen JoshRosen force-pushed the webui-selenium-tests branch from fb2a0f9 to 510e54a Compare October 25, 2014 22:56
@JoshRosen
Copy link
Contributor Author

Thanks to the Hive 13 patch being merged, it's no longer necessary for me to add an explicit version of httpclient. I've pushed a new commit which moves these Selenium tests to their own suite which isn't run by default. For now, the only way to run this suite is to manually uncomment the @DoNotDiscover annotation; I've opened a ticket with ScalaTest to fix support for manually running this test in SBT (scalatest/scalatest#417).

Once this passes tests, I'm going to merge this so that I can use this basic test framework in some subsequent PRs to fix a few display bugs.

@SparkQA
Copy link

SparkQA commented Oct 25, 2014

Test build #22224 has started for PR 2474 at commit 510e54a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 25, 2014

Test build #22224 has finished for PR 2474 at commit 510e54a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22224/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22226 has started for PR 2474 at commit 510e54a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22226 has finished for PR 2474 at commit 510e54a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22226/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22232 has started for PR 2474 at commit 510e54a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22233 has started for PR 2474 at commit 510e54a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22232 has finished for PR 2474 at commit 510e54a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22232/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22233 has finished for PR 2474 at commit 510e54a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22233/
Test FAILed.

@JoshRosen
Copy link
Contributor Author

I tried downloading and building this locally and it looks like there's a legitimate build failure in Catalyst:

[info] Compiling 19 Scala sources to /Users/joshrosen/Documents/spark/sql/catalyst/target/scala-2.10/test-classes...
[error] /Users/joshrosen/Documents/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:26: object scalautils is not a member of package org
[error] import org.scalautils.TripleEqualsSupport.Spread
[error]            ^
[error] /Users/joshrosen/Documents/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:137: not found: type Spread
[error]   def checkDoubleEvaluation(expression: Expression, expected: Spread[Double], inputRow: Row = EmptyRow): Unit = {
[error]                                                               ^
[error] /Users/joshrosen/Documents/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala:141: Cannot prove that Double <:< AnyRef.
[error]     actual.asInstanceOf[Double] shouldBe expected
[error]                                 ^
[error] three errors found
[error] (catalyst/test:compile) Compilation failed
[error] Total time: 64 s, completed Oct 25, 2014 11:07:55 PM

This looks like a legitimate build failure, but I'm not sure how this PR could have caused it.

@JoshRosen
Copy link
Contributor Author

Oh, I see: "scalautils" was renamed to "scalactic" in ScalaTest 2.2.0+. I'd still like to bump ScalaTest so that we're using the newest version of the ScalaTest's Selenium integration and so that we benefit from the tagged tests improvements:

Allow multiple -l and -n parameters passed to Runner. The driving use case for this change is to make it easier to write sbt code that will include or exclude different tags in different situations.

I'll push a fixup commit to port us over to the new ScalaTest.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22236 has started for PR 2474 at commit fcc9e83.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22236 has finished for PR 2474 at commit fcc9e83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22236/
Test PASSed.

@JoshRosen
Copy link
Contributor Author

I've merged this into master. I'm going to close out a few of the old, unreproducible UI JIRAs for which I've added Selenium regression tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants