-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
When creating a model Monitor and attaching a schedule using "create_monitoring_schedule", If the schedule fails to create due to Validation Exception, the schedule is never created, but the Model_monitor class retains the variables for schedule name etc.
This causes issues, because you can't delete the monitor using delete_monitoring_schedule(), but you cant create a new one as it is already initialized.
To reproduce
Create a Model Monitor
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker import get_execution_role
role = get_execution_role()
my_monitor = DefaultModelMonitor(
role=role,
instance_count=1,
instance_type='ml.m5.xlarge',
volume_size_in_gb=20,
max_runtime_in_seconds=3600,
)
my_monitor.suggest_baseline(
baseline_dataset='s3://grayjh/player_data/player_data.csv',
dataset_format=DatasetFormat.csv(header=True),
)
Create a bad schedule:
from sagemaker.model_monitor import CronExpressionGenerator
my_monitor.create_monitoring_schedule(
monitor_schedule_name='my-monitoring-schedule',
endpoint_input='mlops-bia-xgboost-2019-09-23-18-44-06-Prod',
statistics=my_monitor.baseline_statistics(),
constraints=my_monitor.suggested_constraints(),
schedule_cron_expression="Bad Cron",
)
It should fail due to a bad CRON Expression
ClientError: An error occurred (ValidationException) when calling the CreateMonitoringSchedule operation: InvalidParameter: 1 validation error(s) found.
- format cron(0 \d+(/12)? *|? * *|? *), Bad Cron, CreateMonitoringScheduleInput.MonitoringScheduleConfig.ScheduleConfig.ScheduleExpression.
Try and recreate a valid monitor schedule
my_monitor.create_monitoring_schedule(
monitor_schedule_name='my-monitoring-schedule1',
endpoint_input='mlops-bia-xgboost-2019-09-23-18-44-06-Prod',
statistics=my_monitor.baseline_statistics(),
constraints=my_monitor.suggested_constraints(),
schedule_cron_expression=CronExpressionGenerator.hourly(),
)
You get a fails to create error due to there already being a schedule
It seems that this object was already used to create an Amazon Model Monitoring Schedule. To create another, first delete the existing one using my_monitor.delete_monitoring_schedule().
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-63036cc6383a> in <module>()
4 statistics=my_monitor.baseline_statistics(),
5 constraints=my_monitor.suggested_constraints(),
----> 6 schedule_cron_expression=CronExpressionGenerator.hourly(),
7 )
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model_monitor/model_monitoring.py in create_monitoring_schedule(self, endpoint_input, record_preprocessor_script, post_analytics_processor_script, output_s3_uri, constraints, statistics, monitor_schedule_name, schedule_cron_expression, enable_cloudwatch_metrics)
1213 )
1214 print(message)
-> 1215 raise ValueError(message)
1216
1217 self.monitoring_schedule_name = self._generate_monitoring_schedule_name(
ValueError: It seems that this object was already used to create an Amazon Model Monitoring Schedule. To create another, first delete the existing one using my_monitor.delete_monitoring_schedule().
Try and Delete that schedule
my_monitor.delete_monitoring_schedule()
This also fails:
ResourceNotFound: An error occurred (ResourceNotFound) when calling the DeleteMonitoringSchedule operation: Monitoring Schedule arn:aws:sagemaker:us-east-1:210829804582:monitoring-schedule/my-monitoring-schedule1 not found
The workaround is to manually force the schedule name to be None
my_monitor.monitoring_schedule_name = None
Expected behavior
I would expect that if the create_monitoring_schedule fails, the object variables should remain to None so that we can create without modifying the variables manually.
Screenshots or logs
Will provide example NoteBook with logs and repro steps.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 1.65.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
- Framework version: N/A
- Python version: python 3 (tested using default conda_python3 kernal on sagemaker notebook with updated sagemaker-python-sdk)
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context