-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
System Information
- Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Mxnet / BYOM
- Framework Version: 1.1
- Python Version: 3.6
- CPU or GPU: both
- Python SDK Version: 1.4.1
- Are you using a custom image: Yes - custom algorithm on mxnet estimator.
Describe the problem
My remote deployment log shows 4+ simultaneous executions of the same codes with 1 segmentation fault. The model gets served eventually. Should I be concerned?
The seg fault originates from mxnet and immediately after container_support.serving - importing user module. However, due to the mysterious parallel execution of things, I am not sure if the sequence of events are from the same process.
The only feasible explanation to me is that model serving keeps spamming docker instances until seg fault. However, I did not seem to find such descriptions in the documentations.
Minimal repro / logs
[2018-06-27 07:20:09 +0000] [103] [INFO] Booting worker with pid: 103
/usr/local/lib/python3.5/dist-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016
monkey.patch_all(subprocess=True)
2018-06-27 07:20:09,981 INFO - container_support.serving - creating Server instance
2018-06-27 07:20:10,005 INFO - container_support.serving - importing user module
Segmentation fault: 11
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2a9e78) [0x7f531f03be78]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2909e8e) [0x7f532169be8e]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f53530414b0]
[bt] (3) /usr/bin/python() [0x513ef4]
[bt] (4) /usr/bin/python() [0x57d7e8]
[bt] (5) /usr/bin/python() [0x518248]
[bt] (6) /usr/bin/python() [0x464e67]
[bt] (7) /usr/bin/python(_PyObject_GC_NewVar+0xbe) [0x518d0e]
[bt] (8) /usr/bin/python(PyTuple_New+0xff) [0x58456f]
[bt] (9) /usr/bin/python() [0x52116b]
[2018-06-27 07:20:10 +0000] [110] [INFO] Booting worker with pid: 110
/usr/local/lib/python3.5/dist-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016
monkey.patch_all(subprocess=True)
mxnet 1.1.0
Metadata
Metadata
Assignees
Labels
No labels