Skip to content

debugpy listen silently crashing #1749

@koenlek

Description

@koenlek

Environment data

  • debugpy version: 1.8.8
  • OS and version: A k8s pod running an Ubuntu 20.04.6 based container
  • Python version (& distribution if applicable, e.g. Anaconda): 3.9
  • Using VS Code or Visual Studio: VS Code

Actual behavior

I'm using the Ray Distributed Debugger (their code here) with Ray on K8S. It runs debugpy.listen , but when I check the port on which it listens, nothing is bound to that port (sudo lsof -i :$LISTEN_PORT). I enabled DEBUGPY_LOG_DIR to get more detailed logs, and I noticed that debugpy.pydevd.NNNN.log contains this near the end, indicating that it indeed crashed:

Traceback (most recent call last):
  File "/my_app/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 422, in _on_run
    cmd.send(self.sock)
  File "/my_app/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command.py", line 109, in send
    sock.sendall(as_bytes)
BrokenPipeError: [Errno 32] Broken pipe

I looked in the issue trackers of debugpy, pydevd, and ray, and did some googling, and couldn't find much unfortunately. The only thing I found is that this may point to the connection between the local services (there is a client, server, and "debug server" and some incoming client (?) involved in running debugpy on the application side, it seems) breaking. I found this snippet in debugpy.adapter.NNNN.log:

I+00000.071: Listening for incoming Client connections on 10.40.0.130:51507...

I+00000.071: Listening for incoming Server connections on 127.0.0.1:39415...

I+00000.071: Sending endpoints info to debug server at localhost:60997:
             {
                 "client": {
                     "host": "10.40.0.130",
                     "port": 51507
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 39415
                 }
             }

I+00000.076: Accepted incoming Server connection from 127.0.0.1:43864.

Lastly, I noticed this in debugpy.{adapter,server}.NNNN.log but that seems to be ok, as I also saw this in healthy local runs:

I+00000.049: Error while enumerating installed packages.
             
Traceback (most recent call last):
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 362, in get_environment_description
    report("    {0}=={1}\n", pkg.name, pkg.version)
AttributeError: 'PathDistribution' object has no attribute 'name'

Stack where logged:
  File "/my_app/python3_x86_64/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/my_app/python3_x86_64/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/my_app/debugpy/adapter/__main__.py", line 227, in <module>
    main(_parse_argv(sys.argv))
  File "/my_app/debugpy/adapter/__main__.py", line 50, in main
    log.describe_environment("debugpy.adapter startup environment:")
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 372, in describe_environment
    info("{0}", get_environment_description(header))
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 364, in get_environment_description
    swallow_exception(
  File "/my_app/debugpy/adapter/../../debugpy/common/log.py", line 215, in swallow_exception
    _exception(format_string, *args, **kwargs)

All of this crashes already before I try connecting to the debugger.

I was also able to reproduce this without using Ray Distributed Debugger. I just connect to the k8s pod, create a small python script:

import debugpy
debugpy.listen(5678)
print("before wait_for_client")
debugpy.wait_for_client()
print("after wait_for_client")
print("before breakpoint")
debugpy.breakpoint()
print("after breakpoint")

Run it and check the log files and see the same crash happening (BrokenPipeError: [Errno 32] Broken pipe) in the pydevd logs.

When I run all of this locally, everything works fine. When running on ray on k8s, I run into this issue...

These are the full, lightly redacted, logs:

Questions:

  • Is there a way to detect a crashed listen from code? If so, how?
  • Any ideas on what makes this crash?

Expected behavior

Accepting oncoming connections on the debugpy.listen endpoint.

Steps to reproduce:

I'm afraid it will be hard to reproduce this in an environment other than our "ray on k8s" setup. But details are in the "Actual behavior" section.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions