-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Please fill out the form below.
System Information
- Framework: Pytorch
- Framework Version: 1.0.0
- Python Version: 3
- CPU or GPU: CPU
- Python SDK Version: 1.18.4
- Are you using a custom image: No
Describe the problem
Model deployment fails with cryptic errors. See the logs below. The command issued to deploy the model is the following:
MODEL_PATH = 's3:///sagemaker-us-east-2-971148336196/improved-ner-training-0-25-0/output/model.tar.gz'
MODEL_NAME = 'improved-ner-model-model-' + os.environ['ENVIRONMENT']
ENDPOINT_NAME = 'improved-ner-model-sagemaker-endpoint-' + os.environ['ENVIRONMENT']
DEPLOY_INSTANCE = 'ml.m5.large'
model = PyTorchModel(model_data=MODEL_PATH, role=ROLE, entry_point='train_model.py',
sagemaker_session=sm_session, py_version='py3', framework_version='1.0.0',
name=ENDPOINT_NAME)
model.deploy(initial_instance_count=1, instance_type=DEPLOY_INSTANCE, endpoint_name=ENDPOINT_NAME)
The model is publicly available here:
https://s3.us-east-2.amazonaws.com/sagemaker-us-east-2-971148336196/improved-ner-training-0-25-0/output/model.tar.gz
It contains a directory called flair which contains the final_model.pt
The (relevant) part of the train_model.py script is the following:
def model_fn(model_dir):
f_out = os.path.join(model_dir, 'flair')
m = SequenceTagger.load_from_file(os.path.join(f_out, 'final-model.pt'))
return m
def input_fn(request_body, request_content_type):
if request_content_type.lower() != 'application/json':
raise ValueError('Content type must be application/json')
if 'sentence' not in request_body:
raise ValueError('Request must be JSON formatted with key: sentence')
return request_body['sentence']
def predict_fn(input_data, model):
return model.predict(input_data)
if __name__ == "__main__":
args, _ = parse_args()
flair_out = os.path.join(args.model_dir, 'flair')
trainer(flair_out) # This trains a model using flair.trainer.ModelTrainer
model = SequenceTagger.load_from_file(os.path.join(flair_out, 'final-model.pt'))
# create example sentence
sentence = Sentence('I love Berlin')
# predict tags and print
model.predict(sentence)
Minimal repro / logs
The CloudWatch logs are very opaque. One of the errors is the following:
sagemaker_containers._errors.ClientError: [Errno 30] Read-only file system: '/opt/ml/model/flair/final-model.pt'
Then, much later, these errors pop up:
Processing /opt/ml/code
Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/pip-req-tracker-27gca9by/35241637574d11bf9bde50616c67372a334f94fa8356bc7164af8ca3'
You are using pip version 18.1, however version 19.0.3 is available.
[2019-04-12 03:49:26 +0000] [25] [ERROR] Error handling request /ping
Any ideas of what is actually causing the error, or some other steps to take to make it easier to debug?
Metadata
Metadata
Assignees
Labels
No labels