Skip to content

Support SparkIntegration activation after SparkContext created #3410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
seyoon-lim opened this issue Aug 7, 2024 · 3 comments
Open

Support SparkIntegration activation after SparkContext created #3410

seyoon-lim opened this issue Aug 7, 2024 · 3 comments

Comments

@seyoon-lim
Copy link
Contributor

Problem Statement

Hello.

In my use case of using Sentry's SparkIntegration, I need to initialize Sentry when SparkContext has already been created.

In such cases, I still want to use the SparkIntegration feature.

Solution Brainstorm

I have a custom-built code that I want to submit as a PR. There are still some adjustments needed on the test code side, so I will submit it as a draft PR for now. Once the revisions are complete (soon), I will remove the draft status.

@szokeasaurusrex
Copy link
Member

Hey @seyoon-lim, I am just wondering why you need to initialize the SDK after creating the SparkContext? Generally, we recommend initializing the SDK as soon as possible in the application lifecycle, and I believe many of our integrations depend on this being the case.

Figuring out how to initialize the SDK before initializing the Spark Context might be easier and more reliable than trying to get the integration to work when the SDK is only initialized after the Spark Context.

@seyoon-lim
Copy link
Contributor Author

seyoon-lim commented Aug 27, 2024

@szokeasaurusrex

I'm using a PySpark decorator through Airflow (https://airflow.apache.org/docs/apache-airflow-providers-apache-spark/stable/decorators/pyspark.html#example).

Additionally, I have created a custom decorator that works as spark-submit. In this case, the function is written as a script file and submitted to YARN.

In that case, the Spark context is created before sentry_init since my airflow task function which is decorated includes the sentry_init line (https://github.com/apache/airflow/blob/2.10.0/airflow/providers/apache/spark/decorators/pyspark.py#L105).

This case motivated me to support the "SparkIntegration activating after the SparkContext is created."

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Aug 27, 2024
@seyoon-lim
Copy link
Contributor Author

@szokeasaurusrex
Sorry if my explanation was complex. I've encountered situations where the Spark context is already created and injected. Thus, I believe it would be more useful if the SparkIntegration works regardless of when sentry_init is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

5 participants