Skip to content

upserting a pipeline kills caching #2736

@neilmcguigan

Description

@neilmcguigan

Describe the bug
Upserting a pipeline always causes cache misses

To reproduce
Create a pipeline with caching enabled. Run the pipeline. Upsert the pipeline. Run the pipeline again. You'll get a cache miss every time

cache_config = CacheConfig(
    enable_caching=True,
    expire_after = "P30M"
)

# define the pre-processor:
processor = PySparkProcessor(
    role=role,
    instance_type="ml.m5.large",
    instance_count=1,
    framework_version="3.0"
)

step1 = ProcessingStep(
    name="step1",
    processor=processor,
    code="processor1.py",
    cache_config=cache_config
)

# define the pipeline
pipeline = Pipeline(
    name="pipeline1",
    steps=[step1]
)

pipeline.upsert(role_arn="...")

pipeline.start()

Expected behavior
I expect a cache hit, even if I upsert a pipeline

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.66.2 (latest)
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): any
  • Framework version: any
  • Python version: 3.8.10
  • CPU or GPU: any
  • Custom Docker image (Y/N): no

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions