Skip to content

Does model serving imply spamming docker instances until seg faults? #260

@yifeim

Description

@yifeim

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Mxnet / BYOM
  • Framework Version: 1.1
  • Python Version: 3.6
  • CPU or GPU: both
  • Python SDK Version: 1.4.1
  • Are you using a custom image: Yes - custom algorithm on mxnet estimator.

Describe the problem

My remote deployment log shows 4+ simultaneous executions of the same codes with 1 segmentation fault. The model gets served eventually. Should I be concerned?

The seg fault originates from mxnet and immediately after container_support.serving - importing user module. However, due to the mysterious parallel execution of things, I am not sure if the sequence of events are from the same process.

The only feasible explanation to me is that model serving keeps spamming docker instances until seg fault. However, I did not seem to find such descriptions in the documentations.

Minimal repro / logs

[2018-06-27 07:20:09 +0000] [103] [INFO] Booting worker with pid: 103
/usr/local/lib/python3.5/dist-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016
monkey.patch_all(subprocess=True)
2018-06-27 07:20:09,981 INFO - container_support.serving - creating Server instance
2018-06-27 07:20:10,005 INFO - container_support.serving - importing user module
Segmentation fault: 11
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2a9e78) [0x7f531f03be78]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2909e8e) [0x7f532169be8e]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f53530414b0]
[bt] (3) /usr/bin/python() [0x513ef4]
[bt] (4) /usr/bin/python() [0x57d7e8]
[bt] (5) /usr/bin/python() [0x518248]
[bt] (6) /usr/bin/python() [0x464e67]
[bt] (7) /usr/bin/python(_PyObject_GC_NewVar+0xbe) [0x518d0e]
[bt] (8) /usr/bin/python(PyTuple_New+0xff) [0x58456f]
[bt] (9) /usr/bin/python() [0x52116b]
[2018-06-27 07:20:10 +0000] [110] [INFO] Booting worker with pid: 110
/usr/local/lib/python3.5/dist-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016
monkey.patch_all(subprocess=True)
mxnet 1.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions