-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[lldb-dap] Improving consistency of tests by removing concurrency. #165496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races. To improve the consistency I've removed the reader thread and instead switched to using the `selectors` module that wraps `select` in a platform independent way.
|
@llvm/pr-subscribers-lldb Author: John Harrison (ashgti) ChangesWe currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races. To improve the consistency I've removed the reader thread and instead switched to using the Patch is 23.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165496.diff 8 Files Affected:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index d892c01f0bc71..bbb0eb0198d84 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -10,8 +10,8 @@
import subprocess
import signal
import sys
-import threading
import warnings
+import selectors
import time
from typing import (
Any,
@@ -139,35 +139,6 @@ def dump_memory(base_addr, data, num_per_line, outfile):
outfile.write("\n")
-def read_packet(
- f: IO[bytes], trace_file: Optional[IO[str]] = None
-) -> Optional[ProtocolMessage]:
- """Decode a JSON packet that starts with the content length and is
- followed by the JSON bytes from a file 'f'. Returns None on EOF.
- """
- line = f.readline().decode("utf-8")
- if len(line) == 0:
- return None # EOF.
-
- # Watch for line that starts with the prefix
- prefix = "Content-Length: "
- if line.startswith(prefix):
- # Decode length of JSON bytes
- length = int(line[len(prefix) :])
- # Skip empty line
- separator = f.readline().decode()
- if separator != "":
- Exception("malformed DAP content header, unexpected line: " + separator)
- # Read JSON bytes
- json_str = f.read(length).decode()
- if trace_file:
- trace_file.write("from adapter:\n%s\n" % (json_str))
- # Decode the JSON bytes into a python dictionary
- return json.loads(json_str)
-
- raise Exception("unexpected malformed message from lldb-dap: " + line)
-
-
def packet_type_is(packet, packet_type):
return "type" in packet and packet["type"] == packet_type
@@ -199,16 +170,8 @@ def __init__(
self.log_file = log_file
self.send = send
self.recv = recv
-
- # Packets that have been received and processed but have not yet been
- # requested by a test case.
- self._pending_packets: List[Optional[ProtocolMessage]] = []
- # Received packets that have not yet been processed.
- self._recv_packets: List[Optional[ProtocolMessage]] = []
- # Used as a mutex for _recv_packets and for notify when _recv_packets
- # changes.
- self._recv_condition = threading.Condition()
- self._recv_thread = threading.Thread(target=self._read_packet_thread)
+ self.selector = selectors.DefaultSelector()
+ self.selector.register(recv, selectors.EVENT_READ)
# session state
self.init_commands = init_commands
@@ -234,9 +197,6 @@ def __init__(
# keyed by breakpoint id
self.resolved_breakpoints: dict[str, Breakpoint] = {}
- # trigger enqueue thread
- self._recv_thread.start()
-
@classmethod
def encode_content(cls, s: str) -> bytes:
return ("Content-Length: %u\r\n\r\n%s" % (len(s), s)).encode("utf-8")
@@ -252,17 +212,46 @@ def validate_response(cls, command, response):
f"seq mismatch in response {command['seq']} != {response['request_seq']}"
)
- def _read_packet_thread(self):
- try:
- while True:
- packet = read_packet(self.recv, trace_file=self.trace_file)
- # `packet` will be `None` on EOF. We want to pass it down to
- # handle_recv_packet anyway so the main thread can handle unexpected
- # termination of lldb-dap and stop waiting for new packets.
- if not self._handle_recv_packet(packet):
- break
- finally:
- dump_dap_log(self.log_file)
+ def _read_packet(
+ self,
+ timeout: float = DEFAULT_TIMEOUT,
+ ) -> Optional[ProtocolMessage]:
+ """Decode a JSON packet that starts with the content length and is
+ followed by the JSON bytes from self.recv. Returns None on EOF.
+ """
+
+ ready = self.selector.select(timeout)
+ if not ready:
+ warnings.warn(
+ "timeout occurred waiting for a packet, check if the test has a"
+ " negative assertion and see if it can be inverted.",
+ stacklevel=4,
+ )
+ return None # timeout
+
+ line = self.recv.readline().decode("utf-8")
+ if len(line) == 0:
+ return None # EOF.
+
+ # Watch for line that starts with the prefix
+ prefix = "Content-Length: "
+ if line.startswith(prefix):
+ # Decode length of JSON bytes
+ length = int(line[len(prefix) :])
+ # Skip empty line
+ separator = self.recv.readline().decode()
+ if separator != "":
+ Exception("malformed DAP content header, unexpected line: " + separator)
+ # Read JSON bytes
+ json_str = self.recv.read(length).decode()
+ if self.trace_file:
+ self.trace_file.write(
+ "%s from adapter:\n%s\n" % (time.time(), json_str)
+ )
+ # Decode the JSON bytes into a python dictionary
+ return json.loads(json_str)
+
+ raise Exception("unexpected malformed message from lldb-dap: " + line)
def get_modules(
self, start_module: Optional[int] = None, module_count: Optional[int] = None
@@ -310,34 +299,6 @@ def collect_output(
output += self.get_output(category, clear=clear)
return output
- def _enqueue_recv_packet(self, packet: Optional[ProtocolMessage]):
- with self.recv_condition:
- self.recv_packets.append(packet)
- self.recv_condition.notify()
-
- def _handle_recv_packet(self, packet: Optional[ProtocolMessage]) -> bool:
- """Handles an incoming packet.
-
- Called by the read thread that is waiting for all incoming packets
- to store the incoming packet in "self._recv_packets" in a thread safe
- way. This function will then signal the "self._recv_condition" to
- indicate a new packet is available.
-
- Args:
- packet: A new packet to store.
-
- Returns:
- True if the caller should keep calling this function for more
- packets.
- """
- with self._recv_condition:
- self._recv_packets.append(packet)
- self._recv_condition.notify()
- # packet is None on EOF
- return packet is not None and not (
- packet["type"] == "response" and packet["command"] == "disconnect"
- )
-
def _recv_packet(
self,
*,
@@ -361,46 +322,34 @@ def _recv_packet(
The first matching packet for the given predicate, if specified,
otherwise None.
"""
- assert (
- threading.current_thread != self._recv_thread
- ), "Must not be called from the _recv_thread"
-
- def process_until_match():
- self._process_recv_packets()
- for i, packet in enumerate(self._pending_packets):
- if packet is None:
- # We need to return a truthy value to break out of the
- # wait_for, use `EOFError` as an indicator of EOF.
- return EOFError()
- if predicate and predicate(packet):
- self._pending_packets.pop(i)
- return packet
-
- with self._recv_condition:
- packet = self._recv_condition.wait_for(process_until_match, timeout)
- return None if isinstance(packet, EOFError) else packet
-
- def _process_recv_packets(self) -> None:
+ deadline = time.time() + timeout
+
+ while time.time() < deadline:
+ packet = self._read_packet(timeout=deadline - time.time())
+ if packet is None:
+ return None
+ self._process_recv_packet(packet)
+ if not predicate or predicate(packet):
+ return packet
+
+ def _process_recv_packet(self, packet) -> None:
"""Process received packets, updating the session state."""
- with self._recv_condition:
- for packet in self._recv_packets:
- if packet and ("seq" not in packet or packet["seq"] == 0):
- warnings.warn(
- f"received a malformed packet, expected 'seq != 0' for {packet!r}"
- )
- # Handle events that may modify any stateful properties of
- # the DAP session.
- if packet and packet["type"] == "event":
- self._handle_event(packet)
- elif packet and packet["type"] == "request":
- # Handle reverse requests and keep processing.
- self._handle_reverse_request(packet)
- # Move the packet to the pending queue.
- self._pending_packets.append(packet)
- self._recv_packets.clear()
+ if packet and ("seq" not in packet or packet["seq"] == 0):
+ warnings.warn(
+ f"received a malformed packet, expected 'seq != 0' for {packet!r}"
+ )
+ # Handle events that may modify any stateful properties of
+ # the DAP session.
+ if packet and packet["type"] == "event":
+ self._handle_event(packet)
+ elif packet and packet["type"] == "request":
+ # Handle reverse requests and keep processing.
+ self._handle_reverse_request(packet)
def _handle_event(self, packet: Event) -> None:
"""Handle any events that modify debug session state we track."""
+ self.events.append(packet)
+
event = packet["event"]
body: Optional[Dict] = packet.get("body", None)
@@ -453,6 +402,8 @@ def _handle_event(self, packet: Event) -> None:
self.invalidated_event = packet
elif event == "memory":
self.memory_event = packet
+ elif event == "module":
+ self.module_events.append(packet)
def _handle_reverse_request(self, request: Request) -> None:
if request in self.reverse_requests:
@@ -521,18 +472,14 @@ def send_packet(self, packet: ProtocolMessage) -> int:
Returns the seq number of the request.
"""
- # Set the seq for requests.
- if packet["type"] == "request":
- packet["seq"] = self.sequence
- self.sequence += 1
- else:
- packet["seq"] = 0
-
+ packet["seq"] = self.sequence
+ self.sequence += 1
+
# Encode our command dictionary as a JSON string
json_str = json.dumps(packet, separators=(",", ":"))
if self.trace_file:
- self.trace_file.write("to adapter:\n%s\n" % (json_str))
+ self.trace_file.write("%s to adapter:\n%s\n" % (time.time(), json_str))
length = len(json_str)
if length > 0:
@@ -913,6 +860,8 @@ def request_restart(self, restartArguments=None):
if restartArguments:
command_dict["arguments"] = restartArguments
+ # Clear state, the process is about to restart...
+ self._process_continued(True)
response = self._send_recv(command_dict)
# Caller must still call wait_for_stopped.
return response
@@ -1479,8 +1428,10 @@ def request_testGetTargetBreakpoints(self):
def terminate(self):
self.send.close()
- if self._recv_thread.is_alive():
- self._recv_thread.join()
+ self.recv.close()
+ self.selector.close()
+ if self.log_file:
+ dump_dap_log(self.log_file)
def request_setInstructionBreakpoints(self, memory_reference=[]):
breakpoints = []
@@ -1577,6 +1528,7 @@ def launch(
stdout=subprocess.PIPE,
stderr=sys.stderr,
env=adapter_env,
+ bufsize=0,
)
if connection is None:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index 29935bb8046ff..fd07324d2ddda 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -416,7 +416,7 @@ def continue_to_next_stop(self):
return self.dap_server.wait_for_stopped()
def continue_to_breakpoint(self, breakpoint_id: str):
- self.continue_to_breakpoints((breakpoint_id))
+ self.continue_to_breakpoints([breakpoint_id])
def continue_to_breakpoints(self, breakpoint_ids):
self.do_continue()
diff --git a/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py b/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
index beab4d6c1f5a6..5d5e75b98d45a 100644
--- a/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
+++ b/lldb/test/API/tools/lldb-dap/breakpoint-events/TestDAP_breakpointEvents.py
@@ -81,24 +81,16 @@ def test_breakpoint_events(self):
breakpoint["verified"], "expect foo breakpoint to not be verified"
)
- # Flush the breakpoint events.
- self.dap_server.wait_for_breakpoint_events()
-
# Continue to the breakpoint
- self.continue_to_breakpoints(dap_breakpoint_ids)
+ self.continue_to_breakpoint(foo_bp_id)
+ self.continue_to_next_stop() # foo_bp2
+ self.continue_to_breakpoint(main_bp_id)
+ self.continue_to_exit()
- verified_breakpoint_ids = []
- unverified_breakpoint_ids = []
- for breakpoint_event in self.dap_server.wait_for_breakpoint_events():
- breakpoint = breakpoint_event["body"]["breakpoint"]
- id = breakpoint["id"]
- if breakpoint["verified"]:
- verified_breakpoint_ids.append(id)
- else:
- unverified_breakpoint_ids.append(id)
+ bp_events = [e for e in self.dap_server.events if e["event"] == "breakpoint"]
- self.assertIn(main_bp_id, unverified_breakpoint_ids)
- self.assertIn(foo_bp_id, unverified_breakpoint_ids)
+ main_bp_events = [e for e in bp_events if e['body']['breakpoint']['id'] == main_bp_id]
+ foo_bp_events = [e for e in bp_events if e['body']['breakpoint']['id'] == foo_bp_id]
- self.assertIn(main_bp_id, verified_breakpoint_ids)
- self.assertIn(foo_bp_id, verified_breakpoint_ids)
+ self.assertTrue(main_bp_events)
+ self.assertTrue(foo_bp_events)
diff --git a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
index 8db2316e73fc8..dc6bf38303204 100644
--- a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
+++ b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
@@ -156,6 +156,7 @@ def test_debuggerRoot(self):
self.build_and_launch(
program, debuggerRoot=program_parent_dir, initCommands=commands
)
+ self.continue_to_exit()
output = self.get_console()
self.assertTrue(output and len(output) > 0, "expect console output")
lines = output.splitlines()
@@ -171,7 +172,6 @@ def test_debuggerRoot(self):
% (program_parent_dir, line[len(prefix) :]),
)
self.assertTrue(found, "verified lldb-dap working directory")
- self.continue_to_exit()
def test_sourcePath(self):
"""
diff --git a/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py b/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
index 1f4afabbd161e..862b32ec9ac54 100644
--- a/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
+++ b/lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
@@ -1,9 +1,10 @@
-import dap_server
+"""
+Test 'module' events for dynamically loaded libraries.
+"""
+
from lldbsuite.test.decorators import *
from lldbsuite.test.lldbtest import *
-from lldbsuite.test import lldbutil
import lldbdap_testcase
-import re
class TestDAP_module_event(lldbdap_testcase.DAPTestCaseBase):
@@ -11,48 +12,41 @@ class TestDAP_module_event(lldbdap_testcase.DAPTestCaseBase):
def test_module_event(self):
program = self.getBuildArtifact("a.out")
self.build_and_launch(program)
+ self.continue_to_exit()
- source = "main.cpp"
- breakpoint1_line = line_number(source, "// breakpoint 1")
- breakpoint2_line = line_number(source, "// breakpoint 2")
- breakpoint3_line = line_number(source, "// breakpoint 3")
-
- breakpoint_ids = self.set_source_breakpoints(
- source, [breakpoint1_line, breakpoint2_line, breakpoint3_line]
+ # Module 'remove' events will only contain the 'id' not the 'name',
+ # first lookup the module id to find all the events.
+ a_out_id = next(
+ e
+ for e in self.dap_server.module_events
+ if e["body"]["module"]["name"] == "a.out"
+ )["body"]["module"]["id"]
+ a_out_events = [
+ e
+ for e in self.dap_server.module_events
+ if e["body"]["module"]["id"] == a_out_id
+ ]
+
+ self.assertIn(
+ "new",
+ [e["body"]["reason"] for e in a_out_events],
+ "Expected a.out to load during the debug session.",
)
- self.continue_to_breakpoints(breakpoint_ids)
-
- # We're now stopped at breakpoint 1 before the dlopen. Flush all the module events.
- event = self.dap_server.wait_for_event(["module"])
- while event is not None:
- event = self.dap_server.wait_for_event(["module"])
-
- # Continue to the second breakpoint, before the dlclose.
- self.continue_to_breakpoints(breakpoint_ids)
-
- # Make sure we got a module event for libother.
- event = self.dap_server.wait_for_event(["module"])
- self.assertIsNotNone(event, "didn't get a module event")
- module_name = event["body"]["module"]["name"]
- module_id = event["body"]["module"]["id"]
- self.assertEqual(event["body"]["reason"], "new")
- self.assertIn("libother", module_name)
- # Continue to the third breakpoint, after the dlclose.
- self.continue_to_breakpoints(breakpoint_ids)
-
- # Make sure we got a module event for libother.
- event = self.dap_server.wait_for_event(["module"])
- self.assertIsNotNone(event, "didn't get a module event")
- reason = event["body"]["reason"]
- self.assertEqual(reason, "removed")
- self.assertEqual(event["body"]["module"]["id"], module_id)
-
- # The removed module event should omit everything but the module id and name
- # as they are required fields.
- module_data = event["body"]["module"]
- required_keys = ["id", "name"]
- self.assertListEqual(list(module_data.keys()), required_keys)
- self.assertEqual(module_data["name"], "", "expects empty name.")
-
- self.continue_to_exit()
+ libother_id = next(
+ e
+ for e in self.dap_server.module_events
+ if e["body"]["module"]["name"].startswith("libother.")
+ )["body"]["module"]["id"]
+ libother_events = [
+ e
+ for e in self.dap_server.module_events
+ if e["body"]["module"]["id"] == libother_id
+ ]
+
+ self.assertTrue(libother_events, "Expected libother to produce module events.")
+ self.assertEqual(
+ [e["body"]["reason"] for e in libother_events],
+ ["new", "removed"],
+ "Expected libother to be loaded then unloaded during the debug session.",
+ )
diff --git a/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py b/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
index 0ed53dac5d869..2d00c512721c6 100644
--- a/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
+++ b/lldb/test/API/tools/lldb-dap/module/TestDAP_module.py
@@ -64,19 +64,18 @@ def check_symbols_loaded_with_size():
self.assertEqual(program, program_module["path"])
self.assertIn("addressRange", program_module)
+ self.continue_to_exit()
+
# Collect all the module names we saw as events.
module_new_names = []
module_changed_names =...
[truncated]
|
|
✅ With the latest revision this PR passed the Python code formatter. |
walter-erquinigo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what a good idea
JDevlieghere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The reliability of the DAP tests is a recurring topic at the Developer Meeting this week. Anything we can do in that space is greatly appreciated.
|
To Confirm what I read. Now if you for example send a breakpoint request. It blocks until the default timeout or matches the search predicate ?. Instead of continuously polling on a different thread. |
Correct, we do not process any input from lldb-dap until This does change some timings because some requests will trigger events before they send a response. So there are cases where we need to check the existing state of the debug session instead of waiting for an event. Previously, we had a mechanism to read messages in 1 thread then set a condition variable. In the test thread we would wait for the condition variable to be set, then process any messages we had. This lead to the two threads having different timing. With this change, I think things should be more stable because we'll only ever read 1 message at a time and we won't be racing the reader thread with the test thread. |
lldb/test/API/tools/lldb-dap/module-event/TestDAP_module_event.py
Outdated
Show resolved
Hide resolved
|
This has actually reduced the consistency of tests on Arm 32-bit: https://lab.llvm.org/buildbot/#/builders/18/builds/22356 At least 9 of those are due to a truncated response: However the inconsistency is...inconsistent 🥁 It failed again 2 builds later: https://lab.llvm.org/buildbot/#/builders/18/builds/22358 And has been doing so on and off since. I'm going to revert this. I admit I haven't looked at the content of the PR and am very busy so I won't have time to test it on the machine soon. So please see if you can spot anything that could break due to scheduling or lack of synchronisation, and I'll find the time eventually to try to reproduce it. I don't think this is to do with the host OS, it's just that the 32-bit container is generally lower performance. So running |
…ency." (#165688) Reverts #165496 Due to flaky failures on Arm 32-bit since this change. Detailed in #165496 (comment).
…ing concurrency." (#165688) Reverts llvm/llvm-project#165496 Due to flaky failures on Arm 32-bit since this change. Detailed in llvm/llvm-project#165496 (comment).
…lvm#165496) We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races. To improve the consistency I've removed the reader thread and instead switched to using the `selectors` module that wraps `select` in a platform independent way.
…ency." (llvm#165688) Reverts llvm#165496 Due to flaky failures on Arm 32-bit since this change. Detailed in llvm#165496 (comment).
|
I haven't been able to trigger the problem but I think what's happening is the select is finding some data but not a whole JSON string. Something is delaying the write of the rest (this bot runs on a busy system). It's not always at the JSON part either, just happened to be most of the time in that run. I think what you've done here is an improvement, and making it loop waiting for a complete parseable packet could A: lead to looping forever and B: hide situations where we do send an invalid packet. So this is a good change, but I need some time to figure out how to give the Arm bot a more stable environment. |
|
I've relanded this with dap tests disabled on Arm Linux to give me time to fix our bot. |
|
One idea @adrian-prantl and I have been discussion to improve the reliability of the test is to introduce random delays in lldb-dap for testing. We should totally be able to handle an incomplete packet (and I assume/hope that's what you're referring to when you say "fix our bot"). Maybe that would help debug this issue. |
|
I wasn’t sure if this would stick yet or not, I did some testing on a Debian Linux x84_64 host and macOS on arm64. If we need to revert it to look at other arch and platforms that’s fine by me, I also think the Python version can have an impact here as well. |
I should have been clearer: our as in Linaro's bot. I mean that I'm going to give lldb-arm-ubuntu fixed resources so it doesn't have so many random delays. Separate from whatever you decide to do with lldb-dap. It's something I should do anyway, we used to run a lot of bots on shared machines but over time more and more have fixed resources because of situations like this. I think it's actually unfair to require lldb-dap tests to be stable on that bot because it can vary so much. On the AArch64 Linux and Windows bot where they have the whole machine, it's more fair to expect it. Took me a while to realise that but got there eventually :) Trying random delays to destabalise them, yes that sounds like a good experiment to do. I have tried this with the "stress" utility, not to much success but I'm sure there are better ways. |
…ency." (llvm#165688) Reverts llvm#165496 Due to flaky failures on Arm 32-bit since this change. Detailed in llvm#165496 (comment).
…lvm#165496) We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races. To improve the consistency I've removed the reader thread and instead switched to using the `selectors` module that wraps `select` in a platform independent way.
…ency." (llvm#165688) Reverts llvm#165496 Due to flaky failures on Arm 32-bit since this change. Detailed in llvm#165496 (comment).
We currently use a background thread to read the DAP output. This means the test thread and the background thread can race at times and we may have inconsistent timing due to these races.
To improve the consistency I've removed the reader thread and instead switched to using the
selectorsmodule that wrapsselectin a platform independent way.