-
Notifications
You must be signed in to change notification settings - Fork 137
Description
Hey there!
I'm having some trouble getting my Sagemaker Tensorflow code to work after moving my script to another directory.
Previously, I had the following directory structure:
submit_notebook.ipynb
train.py
setup.py
my_package/
other modules
And it worked with source_dir="." and entry_point="train.py".
Now, I recently moved my training script into one of my package directories as follows:
submit_notebook.ipynb
setup.py
src/
my_package/
train.py
other modules
When running estimator.fit with source_dir="." and entry_point="src/my_package/train.py", I get an ImportError: "No module named src/my_package/train".
Higher up in the logs, I spotted:
"Invoking script with the following command:
/usr/bin/python -m src/my_package/train <some_args>"
After starting in sagemaker-tensorflow-container, I saw that sagemaker_containers._entry_point_type has a check that if there's a "setup.py" file, the entry_point type is PYTHON_PACKAGE.
Later in sagemaker_containers._process, we take any PYTHON_PACKAGE user-given entrypoint string and remove the .py extension.
That makes sense if your entry_point is "train.py", but as mentioned above introduces weirdness when there are directories in the way.
Describing my proposed fix in the PR