Skip to content

Internal server error when using OpenNMT-tf #1057

@anasir-ureed

Description

@anasir-ureed

Reference: 0411850995

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): TensorFlow , OpenNMT-tf
  • Framework Version: 1.14
  • Python Version: py3
  • CPU or GPU: GPU
  • Python SDK Version: latest
  • Are you using a custom image: script training

Describe the problem

I am using OpenNMT-tf for training. When the training reaches the export step, the exported model is not exported properly, and a strange error named "Internal server error" appears.

Minimal repro / logs

Unfortunately, there is nothing in the logs telling anything about the issue!

  • Exact command to reproduce:
estimator = TensorFlow(entry_point=local_training_script_path,
                       dependencies=['model.py'],
                       train_instance_type='ml.p3.2xlarge',
                       train_instance_count=1,
                       checkpoint_s3_uri=checkpoint_path,
                       output_path=model_artifacts_location,
                       code_location=custom_code_upload_location,
                       role=sagemaker.get_execution_role(),
                       framework_version='1.14',
                       py_version='py3',
                       script_mode=True,
                       train_use_spot_instances=train_use_spot_instances,
                       train_max_run=train_max_run,
                       train_max_wait=train_max_wait,
                       train_volume_size=75)
estimator.fit(training_inputs_location, job_name=job_name_sagemaker, wait=True)

The shell script contains

pip install OpenNMT-tf
onmt-main ...

Please advise,

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions