Skip to content

Conversation

@dragos
Copy link
Contributor

@dragos dragos commented Oct 26, 2015

See SPARK-10986 for details.

This fixes the ClassNotFoundException for Spark classes in the serializer.

I am not sure this is the right way to handle the class loader, but I couldn't find any documentation on how the context class loader is used and who relies on it. It seems at least the serializer uses it to instantiate classes during deserialization.

I am open to suggestions (I tried this fix on a real Mesos cluster and it does fix the issue).

@tnachen @andrewor14

This fixes the `ClassNotFoundException` for Spark classes in the serializer.
@kaysoky
Copy link
Member

kaysoky commented Oct 26, 2015

👍

Tested on a local Mesos cluster (fine-grained mode).

@SparkQA
Copy link

SparkQA commented Oct 26, 2015

Test build #44364 has finished for PR 9282 at commit ec1c11b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dragos
Copy link
Contributor Author

dragos commented Oct 27, 2015

@srowen who could have a look at this one? It's a complete blocker for Mesos, fine-grained mode is the default deployment option right now.

@srowen
Copy link
Member

srowen commented Oct 27, 2015

Hm, I don't know much about this part. Does this mirror similar approaches in other code? that would be good evidence that it's a good idea. How is it handled in other similar code? Usually it's better to explicitly handle classloaders where it matters rather than set the context classloader, but it could be the right approach.

@dragos
Copy link
Contributor Author

dragos commented Oct 27, 2015

There's no clear answer, unfortunately. The serializer has a "default class loader" that is set explicitly only when using the REPL. In the other cases it's done through the context class loader, for example in SparkSubmit. In the coarse-grained mode there seems to be some explicit class loader handling, but in the debugger I noticed it's due to another context class loader (seems to be set by a Hadoop class via runAsSparkUser).

@srowen
Copy link
Member

srowen commented Oct 27, 2015

OK, I'm tentatively OK with merging this if it clearly fixes a problem, doesn't cause others (at least tests pass), you both think it's the right thing, and it resembles a similar approach elsewhere.

@tnachen
Copy link
Contributor

tnachen commented Oct 27, 2015

Just curious why this suddenly becomes a problem, do you have any idea what caused this?

@tnachen
Copy link
Contributor

tnachen commented Oct 27, 2015

And also +1 to merge this to fix users problems as well

@dragos
Copy link
Contributor Author

dragos commented Oct 27, 2015

The ticket links to the PR that added the Netty-based RPC protocol. It sounds plausible, since the exception is thrown while initializing the Netty RPC environment, but I didn't try to revert it and see that it's really this PR.

@andrewor14
Copy link
Contributor

retest this please

@andrewor14
Copy link
Contributor

Looks like this is a regression from 1.5.1 so we should definitely fix it. Even though this change is only one line it could change a lot of things so I'd prefer to err on the conservative side. Can we verify that it doesn't cause any new regressions? @dragos can you explain to us the root cause of the issue?

@dragos
Copy link
Contributor Author

dragos commented Oct 29, 2015

The serializer is delegating to the context class loader for instantiating classes it receives on the wire. When this class loader is missing (null), the JVM looks up the class in the primordial classloader, which usually contains only the JDK classes.

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44591 has finished for PR 9282 at commit ec1c11b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 30, 2015

Hm, I'm inclined to go ahead and merge this for master, though I'm also wary of the implications. The problem seems clear, the explanation seems clear and the change is targeted. Tests pass. We may not know more until interested parties here can try this out on Mesos.

@dragos
Copy link
Contributor Author

dragos commented Oct 30, 2015

@srowen I agree. The only missing part is having the opinion of someone who's intimate with the serializer and class loader handling inside Spark. Let's merge this though, since #9027 can't be tested on a real cluster until this fix goes in.

@tnachen
Copy link
Contributor

tnachen commented Oct 30, 2015

+1

@srowen
Copy link
Member

srowen commented Oct 30, 2015

merged to master

@klion26
Copy link
Member

klion26 commented Oct 26, 2017

received ClassNotFound error in Yarn-Cluster mode(spark 1.6.2),doesn't reproduce the problem
The error message is such as below:

[2017-10-26 16:53:18,274] ERROR Error while invoking RpcHandler#receive() for one-way message. (org.apache.spark.network.server.TransportRequestHandler)
java.lang.ClassNotFoundException: org.apache.spark.rpc.RpcAddvess
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:274)
	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:267)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:319)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:266)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:265)
	at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:597)
	at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:586)
	at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:176)
	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:92)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants