Currently, local-mode does not work for PySparkProcessor due to YARN not being configured correctly for local setups. To enable local development, we created an enhanced version of the PySparkProcessor which overrides the underlying functionality of the SageMaker SDK, and runs Spark in local mode rather than using YARN. This enhanced version also preserves the interface that exists with the original PySparkProcessor. It's important to note that this project should serve only as a stop-gap solution (until local-mode is natively supported in SageMaker SDK).
To install:
pip install git+https://github.com/aws-samples/enhanced-pyspark-processorPlease refer to the notebook example for usage patterns.
The following versions have been tested for compatibility.
| SageMaker SDK | Spark | Compatible? |
|---|---|---|
sagemaker >= 2.22.0, <= 2.61.0 |
2.4 |
✔️ |
sagemaker >= 2.22.0, <= 2.61.0 |
3.0 |
✔️ |
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.