Skip to content

Conversation

@ashgti
Copy link
Contributor

@ashgti ashgti commented Oct 29, 2025

We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races.

To improve the consistency I've removed the reader thread and instead switched to using the selectors module that wraps select in a platform independent way.

We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races.

To improve the consistency I've removed the reader thread and instead switched to using the `selectors` module that wraps `select` in a platform independent way.
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2025

@llvm/pr-subscribers-lldb

Author: John Harrison (ashgti)

Changes

We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races.

To improve the consistency I've removed the reader thread and instead switched to using the selectors module that wraps select in a platform independent way.


Patch is 23.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165496.diff

8 Files Affected:

  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+80-128)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py (+9-17)
  • (modified) lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py (+39-45)
  • (modified) lldb/test/API/tools/lldb-dap/module/TestDAP_module.py (+3-5)
  • (modified) lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_console.py (+17-7)
  • (modified) lldb/test/API/tools/lldb-dap/send-event/TestDAP_sendEvent.py (+1-1)
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index d892c01f0bc71..bbb0eb0198d84 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -10,8 +10,8 @@
 import subprocess
 import signal
 import sys
-import threading
 import warnings
+import selectors
 import time
 from typing import (
     Any,
@@ -139,35 +139,6 @@ def dump_memory(base_addr, data, num_per_line, outfile):
         outfile.write("\n")
 
 
-def read_packet(
-    f: IO[bytes], trace_file: Optional[IO[str]] = None
-) -> Optional[ProtocolMessage]:
-    """Decode a JSON packet that starts with the content length and is
-    followed by the JSON bytes from a file 'f'. Returns None on EOF.
-    """
-    line = f.readline().decode("utf-8")
-    if len(line) == 0:
-        return None  # EOF.
-
-    # Watch for line that starts with the prefix
-    prefix = "Content-Length: "
-    if line.startswith(prefix):
-        # Decode length of JSON bytes
-        length = int(line[len(prefix) :])
-        # Skip empty line
-        separator = f.readline().decode()
-        if separator != "":
-            Exception("malformed DAP content header, unexpected line: " + separator)
-        # Read JSON bytes
-        json_str = f.read(length).decode()
-        if trace_file:
-            trace_file.write("from adapter:\n%s\n" % (json_str))
-        # Decode the JSON bytes into a python dictionary
-        return json.loads(json_str)
-
-    raise Exception("unexpected malformed message from lldb-dap: " + line)
-
-
 def packet_type_is(packet, packet_type):
     return "type" in packet and packet["type"] == packet_type
 
@@ -199,16 +170,8 @@ def __init__(
         self.log_file = log_file
         self.send = send
         self.recv = recv
-
-        # Packets that have been received and processed but have not yet been
-        # requested by a test case.
-        self._pending_packets: List[Optional[ProtocolMessage]] = []
-        # Received packets that have not yet been processed.
-        self._recv_packets: List[Optional[ProtocolMessage]] = []
-        # Used as a mutex for _recv_packets and for notify when _recv_packets
-        # changes.
-        self._recv_condition = threading.Condition()
-        self._recv_thread = threading.Thread(target=self._read_packet_thread)
+        self.selector = selectors.DefaultSelector()
+        self.selector.register(recv, selectors.EVENT_READ)
 
         # session state
         self.init_commands = init_commands
@@ -234,9 +197,6 @@ def __init__(
         # keyed by breakpoint id
         self.resolved_breakpoints: dict[str, Breakpoint] = {}
 
-        # trigger enqueue thread
-        self._recv_thread.start()
-
     @classmethod
     def encode_content(cls, s: str) -> bytes:
         return ("Content-Length: %u\r\n\r\n%s" % (len(s), s)).encode("utf-8")
@@ -252,17 +212,46 @@ def validate_response(cls, command, response):
                 f"seq mismatch in response {command['seq']} != {response['request_seq']}"
             )
 
-    def _read_packet_thread(self):
-        try:
-            while True:
-                packet = read_packet(self.recv, trace_file=self.trace_file)
-                # `packet` will be `None` on EOF. We want to pass it down to
-                # handle_recv_packet anyway so the main thread can handle unexpected
-                # termination of lldb-dap and stop waiting for new packets.
-                if not self._handle_recv_packet(packet):
-                    break
-        finally:
-            dump_dap_log(self.log_file)
+    def _read_packet(
+        self,
+        timeout: float = DEFAULT_TIMEOUT,
+    ) -> Optional[ProtocolMessage]:
+        """Decode a JSON packet that starts with the content length and is
+        followed by the JSON bytes from self.recv. Returns None on EOF.
+        """
+
+        ready = self.selector.select(timeout)
+        if not ready:
+            warnings.warn(
+                "timeout occurred waiting for a packet, check if the test has a"
+                " negative assertion and see if it can be inverted.",
+                stacklevel=4,
+            )
+            return None  # timeout
+
+        line = self.recv.readline().decode("utf-8")
+        if len(line) == 0:
+            return None  # EOF.
+
+        # Watch for line that starts with the prefix
+        prefix = "Content-Length: "
+        if line.startswith(prefix):
+            # Decode length of JSON bytes
+            length = int(line[len(prefix) :])
+            # Skip empty line
+            separator = self.recv.readline().decode()
+            if separator != "":
+                Exception("malformed DAP content header, unexpected line: " + separator)
+            # Read JSON bytes
+            json_str = self.recv.read(length).decode()
+            if self.trace_file:
+                self.trace_file.write(
+                    "%s from adapter:\n%s\n" % (time.time(), json_str)
+                )
+            # Decode the JSON bytes into a python dictionary
+            return json.loads(json_str)
+
+        raise Exception("unexpected malformed message from lldb-dap: " + line)
 
     def get_modules(
         self, start_module: Optional[int] = None, module_count: Optional[int] = None
@@ -310,34 +299,6 @@ def collect_output(
             output += self.get_output(category, clear=clear)
         return output
 
-    def _enqueue_recv_packet(self, packet: Optional[ProtocolMessage]):
-        with self.recv_condition:
-            self.recv_packets.append(packet)
-            self.recv_condition.notify()
-
-    def _handle_recv_packet(self, packet: Optional[ProtocolMessage]) -> bool:
-        """Handles an incoming packet.
-
-        Called by the read thread that is waiting for all incoming packets
-        to store the incoming packet in "self._recv_packets" in a thread safe
-        way. This function will then signal the "self._recv_condition" to
-        indicate a new packet is available.
-
-        Args:
-            packet: A new packet to store.
-
-        Returns:
-            True if the caller should keep calling this function for more
-            packets.
-        """
-        with self._recv_condition:
-            self._recv_packets.append(packet)
-            self._recv_condition.notify()
-            # packet is None on EOF
-            return packet is not None and not (
-                packet["type"] == "response" and packet["command"] == "disconnect"
-            )
-
     def _recv_packet(
         self,
         *,
@@ -361,46 +322,34 @@ def _recv_packet(
             The first matching packet for the given predicate, if specified,
             otherwise None.
         """
-        assert (
-            threading.current_thread != self._recv_thread
-        ), "Must not be called from the _recv_thread"
-
-        def process_until_match():
-            self._process_recv_packets()
-            for i, packet in enumerate(self._pending_packets):
-                if packet is None:
-                    # We need to return a truthy value to break out of the
-                    # wait_for, use `EOFError` as an indicator of EOF.
-                    return EOFError()
-                if predicate and predicate(packet):
-                    self._pending_packets.pop(i)
-                    return packet
-
-        with self._recv_condition:
-            packet = self._recv_condition.wait_for(process_until_match, timeout)
-            return None if isinstance(packet, EOFError) else packet
-
-    def _process_recv_packets(self) -> None:
+        deadline = time.time() + timeout
+
+        while time.time() < deadline:
+            packet = self._read_packet(timeout=deadline - time.time())
+            if packet is None:
+                return None
+            self._process_recv_packet(packet)
+            if not predicate or predicate(packet):
+                return packet
+
+    def _process_recv_packet(self, packet) -> None:
         """Process received packets, updating the session state."""
-        with self._recv_condition:
-            for packet in self._recv_packets:
-                if packet and ("seq" not in packet or packet["seq"] == 0):
-                    warnings.warn(
-                        f"received a malformed packet, expected 'seq != 0' for {packet!r}"
-                    )
-                # Handle events that may modify any stateful properties of
-                # the DAP session.
-                if packet and packet["type"] == "event":
-                    self._handle_event(packet)
-                elif packet and packet["type"] == "request":
-                    # Handle reverse requests and keep processing.
-                    self._handle_reverse_request(packet)
-                # Move the packet to the pending queue.
-                self._pending_packets.append(packet)
-            self._recv_packets.clear()
+        if packet and ("seq" not in packet or packet["seq"] == 0):
+            warnings.warn(
+                f"received a malformed packet, expected 'seq != 0' for {packet!r}"
+            )
+        # Handle events that may modify any stateful properties of
+        # the DAP session.
+        if packet and packet["type"] == "event":
+            self._handle_event(packet)
+        elif packet and packet["type"] == "request":
+            # Handle reverse requests and keep processing.
+            self._handle_reverse_request(packet)
 
     def _handle_event(self, packet: Event) -> None:
         """Handle any events that modify debug session state we track."""
+        self.events.append(packet)
+
         event = packet["event"]
         body: Optional[Dict] = packet.get("body", None)
 
@@ -453,6 +402,8 @@ def _handle_event(self, packet: Event) -> None:
             self.invalidated_event = packet
         elif event == "memory":
             self.memory_event = packet
+        elif event == "module":
+            self.module_events.append(packet)
 
     def _handle_reverse_request(self, request: Request) -> None:
         if request in self.reverse_requests:
@@ -521,18 +472,14 @@ def send_packet(self, packet: ProtocolMessage) -> int:
 
         Returns the seq number of the request.
         """
-        # Set the seq for requests.
-        if packet["type"] == "request":
-            packet["seq"] = self.sequence
-            self.sequence += 1
-        else:
-            packet["seq"] = 0
-
+        packet["seq"] = self.sequence
+        self.sequence += 1
+        
         # Encode our command dictionary as a JSON string
         json_str = json.dumps(packet, separators=(",", ":"))
 
         if self.trace_file:
-            self.trace_file.write("to adapter:\n%s\n" % (json_str))
+            self.trace_file.write("%s to adapter:\n%s\n" % (time.time(), json_str))
 
         length = len(json_str)
         if length > 0:
@@ -913,6 +860,8 @@ def request_restart(self, restartArguments=None):
         if restartArguments:
             command_dict["arguments"] = restartArguments
 
+        # Clear state, the process is about to restart...
+        self._process_continued(True)
         response = self._send_recv(command_dict)
         # Caller must still call wait_for_stopped.
         return response
@@ -1479,8 +1428,10 @@ def request_testGetTargetBreakpoints(self):
 
     def terminate(self):
         self.send.close()
-        if self._recv_thread.is_alive():
-            self._recv_thread.join()
+        self.recv.close()
+        self.selector.close()
+        if self.log_file:
+            dump_dap_log(self.log_file)
 
     def request_setInstructionBreakpoints(self, memory_reference=[]):
         breakpoints = []
@@ -1577,6 +1528,7 @@ def launch(
             stdout=subprocess.PIPE,
             stderr=sys.stderr,
             env=adapter_env,
+            bufsize=0,
         )
 
         if connection is None:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index 29935bb8046ff..fd07324d2ddda 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -416,7 +416,7 @@ def continue_to_next_stop(self):
         return self.dap_server.wait_for_stopped()
 
     def continue_to_breakpoint(self, breakpoint_id: str):
-        self.continue_to_breakpoints((breakpoint_id))
+        self.continue_to_breakpoints([breakpoint_id])
 
     def continue_to_breakpoints(self, breakpoint_ids):
         self.do_continue()
diff --git a/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py b/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
index beab4d6c1f5a6..5d5e75b98d45a 100644
--- a/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
+++ b/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
@@ -81,24 +81,16 @@ def test_breakpoint_events(self):
                 breakpoint["verified"], "expect foo breakpoint to not be verified"
             )
 
-        # Flush the breakpoint events.
-        self.dap_server.wait_for_breakpoint_events()
-
         # Continue to the breakpoint
-        self.continue_to_breakpoints(dap_breakpoint_ids)
+        self.continue_to_breakpoint(foo_bp_id)
+        self.continue_to_next_stop()  # foo_bp2
+        self.continue_to_breakpoint(main_bp_id)
+        self.continue_to_exit()
 
-        verified_breakpoint_ids = []
-        unverified_breakpoint_ids = []
-        for breakpoint_event in self.dap_server.wait_for_breakpoint_events():
-            breakpoint = breakpoint_event["body"]["breakpoint"]
-            id = breakpoint["id"]
-            if breakpoint["verified"]:
-                verified_breakpoint_ids.append(id)
-            else:
-                unverified_breakpoint_ids.append(id)
+        bp_events = [e for e in self.dap_server.events if e["event"] == "breakpoint"]
 
-        self.assertIn(main_bp_id, unverified_breakpoint_ids)
-        self.assertIn(foo_bp_id, unverified_breakpoint_ids)
+        main_bp_events = [e for e in bp_events if e['body']['breakpoint']['id'] == main_bp_id]
+        foo_bp_events = [e for e in bp_events if e['body']['breakpoint']['id'] == foo_bp_id]
 
-        self.assertIn(main_bp_id, verified_breakpoint_ids)
-        self.assertIn(foo_bp_id, verified_breakpoint_ids)
+        self.assertTrue(main_bp_events)
+        self.assertTrue(foo_bp_events)
diff --git a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
index 8db2316e73fc8..dc6bf38303204 100644
--- a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
+++ b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
@@ -156,6 +156,7 @@ def test_debuggerRoot(self):
         self.build_and_launch(
             program, debuggerRoot=program_parent_dir, initCommands=commands
         )
+        self.continue_to_exit()
         output = self.get_console()
         self.assertTrue(output and len(output) > 0, "expect console output")
         lines = output.splitlines()
@@ -171,7 +172,6 @@ def test_debuggerRoot(self):
                     % (program_parent_dir, line[len(prefix) :]),
                 )
         self.assertTrue(found, "verified lldb-dap working directory")
-        self.continue_to_exit()
 
     def test_sourcePath(self):
         """
diff --git a/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py b/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
index 1f4afabbd161e..862b32ec9ac54 100644
--- a/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
+++ b/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
@@ -1,9 +1,10 @@
-import dap_server
+"""
+Test 'module' events for dynamically loaded libraries.
+"""
+
 from lldbsuite.test.decorators import *
 from lldbsuite.test.lldbtest import *
-from lldbsuite.test import lldbutil
 import lldbdap_testcase
-import re
 
 
 class TestDAP_module_event(lldbdap_testcase.DAPTestCaseBase):
@@ -11,48 +12,41 @@ class TestDAP_module_event(lldbdap_testcase.DAPTestCaseBase):
     def test_module_event(self):
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(program)
+        self.continue_to_exit()
 
-        source = "main.cpp"
-        breakpoint1_line = line_number(source, "// breakpoint 1")
-        breakpoint2_line = line_number(source, "// breakpoint 2")
-        breakpoint3_line = line_number(source, "// breakpoint 3")
-
-        breakpoint_ids = self.set_source_breakpoints(
-            source, [breakpoint1_line, breakpoint2_line, breakpoint3_line]
+        # Module 'remove' events will only contain the 'id' not the 'name',
+        # first lookup the module id to find all the events.
+        a_out_id = next(
+            e
+            for e in self.dap_server.module_events
+            if e["body"]["module"]["name"] == "a.out"
+        )["body"]["module"]["id"]
+        a_out_events = [
+            e
+            for e in self.dap_server.module_events
+            if e["body"]["module"]["id"] == a_out_id
+        ]
+
+        self.assertIn(
+            "new",
+            [e["body"]["reason"] for e in a_out_events],
+            "Expected a.out to load during the debug session.",
         )
-        self.continue_to_breakpoints(breakpoint_ids)
-
-        # We're now stopped at breakpoint 1 before the dlopen. Flush all the module events.
-        event = self.dap_server.wait_for_event(["module"])
-        while event is not None:
-            event = self.dap_server.wait_for_event(["module"])
-
-        # Continue to the second breakpoint, before the dlclose.
-        self.continue_to_breakpoints(breakpoint_ids)
-
-        # Make sure we got a module event for libother.
-        event = self.dap_server.wait_for_event(["module"])
-        self.assertIsNotNone(event, "didn't get a module event")
-        module_name = event["body"]["module"]["name"]
-        module_id = event["body"]["module"]["id"]
-        self.assertEqual(event["body"]["reason"], "new")
-        self.assertIn("libother", module_name)
 
-        # Continue to the third breakpoint, after the dlclose.
-        self.continue_to_breakpoints(breakpoint_ids)
-
-        # Make sure we got a module event for libother.
-        event = self.dap_server.wait_for_event(["module"])
-        self.assertIsNotNone(event, "didn't get a module event")
-        reason = event["body"]["reason"]
-        self.assertEqual(reason, "removed")
-        self.assertEqual(event["body"]["module"]["id"], module_id)
-
-        # The removed module event should omit everything but the module id and name
-        # as they are required fields.
-        module_data = event["body"]["module"]
-        required_keys = ["id", "name"]
-        self.assertListEqual(list(module_data.keys()), required_keys)
-        self.assertEqual(module_data["name"], "", "expects empty name.")
-
-        self.continue_to_exit()
+        libother_id = next(
+            e
+            for e in self.dap_server.module_events
+            if e["body"]["module"]["name"].startswith("libother.")
+        )["body"]["module"]["id"]
+        libother_events = [
+            e
+            for e in self.dap_server.module_events
+            if e["body"]["module"]["id"] == libother_id
+        ]
+
+        self.assertTrue(libother_events, "Expected libother to produce module events.")
+        self.assertEqual(
+            [e["body"]["reason"] for e in libother_events],
+            ["new", "removed"],
+            "Expected libother to be loaded then unloaded during the debug session.",
+        )
diff --git a/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py b/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
index 0ed53dac5d869..2d00c512721c6 100644
--- a/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
+++ b/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
@@ -64,19 +64,18 @@ def check_symbols_loaded_with_size():
         self.assertEqual(program, program_module["path"])
         self.assertIn("addressRange", program_module)
 
+        self.continue_to_exit()
+
         # Collect all the module names we saw as events.
         module_new_names = []
         module_changed_names =...
[truncated]

@github-actions
Copy link

github-actions bot commented Oct 29, 2025

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Member

@walter-erquinigo walter-erquinigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a good idea

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The reliability of the DAP tests is a recurring topic at the Developer Meeting this week. Anything we can do in that space is greatly appreciated.

@da-viper
Copy link
Contributor

To Confirm what I read.

Now if you for example send a breakpoint request.

It blocks until the default timeout or matches the search predicate ?. Instead of continuously polling on a different thread.

@ashgti
Copy link
Contributor Author

ashgti commented Oct 29, 2025

Now if you for example send a breakpoint request.

It blocks until the default timeout or matches the search predicate ?. Instead of continuously polling on a different thread.

Correct, we do not process any input from lldb-dap until DebugCommunication._recv_packet is called. That is the function that will read 1 packet then process that packet and check the current predicate.

This does change some timings because some requests will trigger events before they send a response. So there are cases where we need to check the existing state of the debug session instead of waiting for an event.

Previously, we had a mechanism to read messages in 1 thread then set a condition variable. In the test thread we would wait for the condition variable to be set, then process any messages we had. This lead to the two threads having different timing.

With this change, I think things should be more stable because we'll only ever read 1 message at a time and we won't be racing the reader thread with the test thread.

@ashgti ashgti merged commit a49bfbe into llvm:main Oct 29, 2025
10 checks passed
@ashgti ashgti deleted the lldb-dap-test-refactor branch October 29, 2025 21:23
@DavidSpickett
Copy link
Collaborator

This has actually reduced the consistency of tests on Arm 32-bit: https://lab.llvm.org/buildbot/#/builders/18/builds/22356

Unresolved Tests (10):
  lldb-api :: tools/lldb-dap/attach-commands/TestDAP_attachCommands.py
  lldb-api :: tools/lldb-dap/breakpoint-assembly/TestDAP_breakpointAssembly.py
  lldb-api :: tools/lldb-dap/breakpoint/TestDAP_logpoints.py
  lldb-api :: tools/lldb-dap/completions/TestDAP_completions.py
  lldb-api :: tools/lldb-dap/exception/TestDAP_exception.py
  lldb-api :: tools/lldb-dap/server/TestDAP_server.py
  lldb-api :: tools/lldb-dap/terminated-event/TestDAP_terminatedEvent.py
  lldb-api :: tools/lldb-dap/threads/TestDAP_threads.py
  lldb-api :: tools/lldb-dap/variables/TestDAP_variables.py
  lldb-api :: tools/lldb-dap/variables/children/TestDAP_variables_children.py

At least 9 of those are due to a truncated response:

$ cat stdio\ -\ 2025-10-30T100358.441 | grep JSONDecodeError
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 8169 (char 8168)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 367 (char 366)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 5696 (char 5695)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 8136 (char 8135)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 8144 (char 8143)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 5696 (char 5695)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 8148 (char 8147)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 8160 (char 8159)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 8163 (char 8162)

However the inconsistency is...inconsistent 🥁 It failed again 2 builds later:

Unresolved Tests (19):
  lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py
  lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setFunctionBreakpoints.py
  lldb-api :: tools/lldb-dap/completions/TestDAP_completions.py
  lldb-api :: tools/lldb-dap/console/TestDAP_console.py
  lldb-api :: tools/lldb-dap/console/TestDAP_redirection_to_console.py
  lldb-api :: tools/lldb-dap/databreakpoint/TestDAP_setDataBreakpoints.py
  lldb-api :: tools/lldb-dap/disassemble/TestDAP_disassemble.py
  lldb-api :: tools/lldb-dap/disconnect/TestDAP_disconnect.py
  lldb-api :: tools/lldb-dap/evaluate/TestDAP_evaluate.py
  lldb-api :: tools/lldb-dap/exception/TestDAP_exception.py
  lldb-api :: tools/lldb-dap/exception/cpp/TestDAP_exception_cpp.py
  lldb-api :: tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py
  lldb-api :: tools/lldb-dap/invalidated-event/TestDAP_invalidatedEvent.py
  lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py
  lldb-api :: tools/lldb-dap/locations/TestDAP_locations.py
  lldb-api :: tools/lldb-dap/memory/TestDAP_memory.py
  lldb-api :: tools/lldb-dap/optimized/TestDAP_optimized.py
  lldb-api :: tools/lldb-dap/output/TestDAP_output.py
  lldb-api :: tools/lldb-dap/restart/TestDAP_restart.py

https://lab.llvm.org/buildbot/#/builders/18/builds/22358

And has been doing so on and off since.

I'm going to revert this. I admit I haven't looked at the content of the PR and am very busy so I won't have time to test it on the machine soon. So please see if you can spot anything that could break due to scheduling or lack of synchronisation, and I'll find the time eventually to try to reproduce it.

I don't think this is to do with the host OS, it's just that the 32-bit container is generally lower performance. So running stress -c in the background might be enough to trigger the problem.

DavidSpickett added a commit that referenced this pull request Oct 30, 2025
…ency." (#165688)

Reverts #165496

Due to flaky failures on Arm 32-bit since this change. Detailed in
#165496 (comment).
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 30, 2025
…ing concurrency." (#165688)

Reverts llvm/llvm-project#165496

Due to flaky failures on Arm 32-bit since this change. Detailed in
llvm/llvm-project#165496 (comment).
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…lvm#165496)

We currently use a background thread to read the DAP output. This means
the test thread and the background thread can race at times and we may
have inconsistent timing due to these races.

To improve the consistency I've removed the reader thread and instead
switched to using the `selectors` module that wraps `select` in a
platform independent way.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…ency." (llvm#165688)

Reverts llvm#165496

Due to flaky failures on Arm 32-bit since this change. Detailed in
llvm#165496 (comment).
@DavidSpickett
Copy link
Collaborator

I haven't been able to trigger the problem but I think what's happening is the select is finding some data but not a whole JSON string. Something is delaying the write of the rest (this bot runs on a busy system). It's not always at the JSON part either, just happened to be most of the time in that run.

I think what you've done here is an improvement, and making it loop waiting for a complete parseable packet could A: lead to looping forever and B: hide situations where we do send an invalid packet.

So this is a good change, but I need some time to figure out how to give the Arm bot a more stable environment.

@DavidSpickett
Copy link
Collaborator

I've relanded this with dap tests disabled on Arm Linux to give me time to fix our bot.

@JDevlieghere
Copy link
Member

One idea @adrian-prantl and I have been discussion to improve the reliability of the test is to introduce random delays in lldb-dap for testing. We should totally be able to handle an incomplete packet (and I assume/hope that's what you're referring to when you say "fix our bot"). Maybe that would help debug this issue.

@ashgti
Copy link
Contributor Author

ashgti commented Oct 30, 2025

I wasn’t sure if this would stick yet or not, I did some testing on a Debian Linux x84_64 host and macOS on arm64. If we need to revert it to look at other arch and platforms that’s fine by me, I also think the Python version can have an impact here as well.

@DavidSpickett
Copy link
Collaborator

We should totally be able to handle an incomplete packet (and I assume/hope that's what you're referring to when you say "fix our bot").

I should have been clearer: our as in Linaro's bot.

I mean that I'm going to give lldb-arm-ubuntu fixed resources so it doesn't have so many random delays. Separate from whatever you decide to do with lldb-dap. It's something I should do anyway, we used to run a lot of bots on shared machines but over time more and more have fixed resources because of situations like this.

I think it's actually unfair to require lldb-dap tests to be stable on that bot because it can vary so much. On the AArch64 Linux and Windows bot where they have the whole machine, it's more fair to expect it. Took me a while to realise that but got there eventually :)

Trying random delays to destabalise them, yes that sounds like a good experiment to do. I have tried this with the "stress" utility, not to much success but I'm sure there are better ways.

luciechoi pushed a commit to luciechoi/llvm-project that referenced this pull request Nov 1, 2025
…ency." (llvm#165688)

Reverts llvm#165496

Due to flaky failures on Arm 32-bit since this change. Detailed in
llvm#165496 (comment).
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
…lvm#165496)

We currently use a background thread to read the DAP output. This means
the test thread and the background thread can race at times and we may
have inconsistent timing due to these races.

To improve the consistency I've removed the reader thread and instead
switched to using the `selectors` module that wraps `select` in a
platform independent way.
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
…ency." (llvm#165688)

Reverts llvm#165496

Due to flaky failures on Arm 32-bit since this change. Detailed in
llvm#165496 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants