Skip to content

Commit 7227bd5

Browse files
HyukjinKwondongjoon-hyun
authored andcommitted
[SPARK-23517][PYTHON] Make pyspark.util._exception_message produce the trace from Java side by Py4JJavaError
## What changes were proposed in this pull request? This PR proposes for `pyspark.util._exception_message` to produce the trace from Java side by `Py4JJavaError`. Currently, in Python 2, it uses `message` attribute which `Py4JJavaError` didn't happen to have: ```python >>> from pyspark.util import _exception_message >>> try: ... sc._jvm.java.lang.String(None) ... except Exception as e: ... pass ... >>> e.message '' ``` Seems we should use `str` instead for now: https://github.com/bartdag/py4j/blob/aa6c53b59027925a426eb09b58c453de02c21b7c/py4j-python/src/py4j/protocol.py#L412 but this doesn't address the problem with non-ascii string from Java side - `https://github.com/bartdag/py4j/issues/306` So, we could directly call `__str__()`: ```python >>> e.__str__() u'An error occurred while calling None.java.lang.String.\n: java.lang.NullPointerException\n\tat java.lang.String.<init>(String.java:588)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:422)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:238)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:214)\n\tat java.lang.Thread.run(Thread.java:745)\n' ``` which doesn't type coerce unicodes to `str` in Python 2. This can be actually a problem: ```python from pyspark.sql.functions import udf spark.conf.set("spark.sql.execution.arrow.enabled", True) spark.range(1).select(udf(lambda x: [[]])()).toPandas() ``` **Before** ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas raise RuntimeError("%s\n%s" % (_exception_message(e), msg)) RuntimeError: Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this. ``` **After** ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas raise RuntimeError("%s\n%s" % (_exception_message(e), msg)) RuntimeError: An error occurred while calling o47.collectAsArrowToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 1 times, most recent failure: Lost task 7.0 in stage 0.0 (TID 7, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/.../spark/python/pyspark/worker.py", line 245, in main process() File "/.../spark/python/pyspark/worker.py", line 240, in process ... Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this. ``` ## How was this patch tested? Manually tested and unit tests were added. Author: hyukjinkwon <[email protected]> Closes apache#20680 from HyukjinKwon/SPARK-23517. (cherry picked from commit fab563b) Signed-off-by: hyukjinkwon <[email protected]>
1 parent 7f07685 commit 7227bd5

File tree

2 files changed

+18
-0
lines changed

2 files changed

+18
-0
lines changed

python/pyspark/tests.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2286,6 +2286,17 @@ def set(self, x=None, other=None, other_x=None):
22862286
self.assertEqual(b._x, 2)
22872287

22882288

2289+
class UtilTests(PySparkTestCase):
2290+
def test_py4j_exception_message(self):
2291+
from pyspark.util import _exception_message
2292+
2293+
with self.assertRaises(Py4JJavaError) as context:
2294+
# This attempts java.lang.String(null) which throws an NPE.
2295+
self.sc._jvm.java.lang.String(None)
2296+
2297+
self.assertTrue('NullPointerException' in _exception_message(context.exception))
2298+
2299+
22892300
@unittest.skipIf(not _have_scipy, "SciPy not installed")
22902301
class SciPyTests(PySparkTestCase):
22912302

python/pyspark/util.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
# See the License for the specific language governing permissions and
1616
# limitations under the License.
1717
#
18+
from py4j.protocol import Py4JJavaError
1819

1920
__all__ = []
2021

@@ -33,6 +34,12 @@ def _exception_message(excp):
3334
>>> msg == _exception_message(excp)
3435
True
3536
"""
37+
if isinstance(excp, Py4JJavaError):
38+
# 'Py4JJavaError' doesn't contain the stack trace available on the Java side in 'message'
39+
# attribute in Python 2. We should call 'str' function on this exception in general but
40+
# 'Py4JJavaError' has an issue about addressing non-ascii strings. So, here we work
41+
# around by the direct call, '__str__()'. Please see SPARK-23517.
42+
return excp.__str__()
3643
if hasattr(excp, "message"):
3744
return excp.message
3845
return str(excp)

0 commit comments

Comments
 (0)