-
Notifications
You must be signed in to change notification settings - Fork 556
Description
How do you use Sentry?
Sentry Saas (sentry.io)
Version
2.5.1
Steps to Reproduce
Hello,
I've encountered an issue when using SparkIntegration with my PySpark application. I was following the guide specified at Spark Driver Integration Documentation and experienced the following AttributeError:
sc._jsc.sc().addSparkListener(listener)
E AttributeError: 'SparkContext' object has no attribute '_jsc'
Upon investigating, it seems that the issue may stem from the code at sentry-python/spark_driver.py#L50. The sc._jsc attribute is set after the SparkContext is initialized, as seen in apache/spark/pyspark/context.py#L296.
Consequently, _start_sentry_listener
and _set_app_properties
referenced at spark_driver.py#L62-L63 should ideally be invoked after spark_context_init is executed.
I have tested this modification using both local and yarn Spark masters, and fixed version in my repo appears to be functioning correctly.
this is my test code
def test_initialize_spark_integration(sentry_init):
# fail with the code: https://github.com/getsentry/sentry-python/blob/2.5.1/sentry_sdk/integrations/spark/spark_driver.py#L53
# success with the code: https://github.com/seyoon-lim/sentry-python/blob/fix-spark-driver-integration/sentry_sdk/integrations/spark/spark_driver.py#L53
sentry_init(integrations=[SparkIntegration()])
SparkContext.getOrCreate()
Looking forward to your feedback and suggestions for addressing this issue.
Thank you!
Expected Result
from pyspark.sql import SparkSession
import sentry_sdk
from sentry_sdk.integrations.spark import SparkIntegration
if __name__ == "__main__":
sentry_sdk.init(
dsn=matrix_dsn,
integrations=[SparkIntegration()],
)
spark = SparkSession.builder.getOrCreate()
...
Actual Result
Traceback (most recent call last):
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/entrypoint.py", line 17, in <module>
spark = SparkSession.builder.getOrCreate()
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/sql/session.py", line 477, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/context.py", line 514, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/context.py", line 201, in __init__
self._do_init(
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/utils.py", line 1710, in runner
return sentry_patched_function(*args, **kwargs)
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/integrations/spark/spark_driver.py", line 69, in _sentry_patched_spark_context_init
_start_sentry_listener(self)
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/integrations/spark/spark_driver.py", line 55, in _start_sentry_listener
sc._jsc.sc().addSparkListener(listener)
AttributeError: 'SparkContext' object has no attribute '_jsc'
Metadata
Metadata
Assignees
Projects
Status