More precise stdout capture #273

misrasaurabh1 · 2025-06-03T08:15:12Z

User description

Each test invocation id now has its precise associated stdout which can be accurately compared across runs
This solves the problem with test fixtures printing non-deterministic stuff on stdout.

PR Type

Enhancement, Tests

Description

Introduce test_stdout_tag for distinct stdout markers
Use new $######...######$ tag format
Update regex and parsing logic to extract output
Adjust tests to expect new stdout tags

Changes walkthrough 📝

Relevant files

Enhancement

instrument_existing_tests.py `Add stdout tagging and print enhancements` codeflash/code_utils/instrument_existing_tests.py Add AST assignment for `test_stdout_tag` variable Switch print statements to `$######` tag format Simplify printing logic for performance mode	+51/-74
parse_test_output.py `Enhance test output parsing logic` codeflash/verification/parse_test_output.py Split regex into start/end match patterns Implement pairing of begin/end match segments Extract runtime and iteration_id correctly	+35/-21

Tests

test_instrument_all_and_run.py `Update instrument_all_and_run tests` tests/test_instrument_all_and_run.py Update expected prints to use `$######` tags Adjust multi-line string assertions for new tags	+23/-45
test_instrument_tests.py `Adjust instrumentation test templates` tests/test_instrument_tests.py Modify instrumentation strings to define `test_stdout_tag` Replace print calls with new `$######` tag format	+172/-82
test_instrumentation_run_results_aiservice.py `Revise aiservice instrumentation stdout tags` tests/test_instrumentation_run_results_aiservice.py Revise behavior logging code to define `test_stdout_tag` Replace prints to use `$######` tag format Remove redundant imports	+35/-24
test_test_runner.py `Fix spacing in test_runner assertion` tests/test_test_runner.py Fix equality assertion spacing	+1/-3

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-03T08:16:41Z

PR Reviewer Guide 🔍

(Review updated until commit `1070815`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Regex robustness New regex for parsing start/end tags may not correctly handle edge cases like colons in names or unexpected whitespace; verify patterns against valid and invalid tag formats. matches_re_start = re.compile(r"!\$######(.?):(.?)([^\.:]?):(.?):(.?):(.?)######\$!\n") matches_re_end = re.compile(r"!######(.?):(.?)([^\.:]?):(.?):(.?):(.?)######!") AST instrumentation fragility Hard-coded lineno offsets in AST node insertion could break if wrapper function structure changes; consider using relative positions or validating offsets dynamically. ), ( [ ast.Assign( targets=[ast.Name(id="test_stdout_tag", ctx=ast.Store())], value=ast.JoinedStr( values=[ ast.FormattedValue(value=ast.Name(id="test_module_name", ctx=ast.Load()), conversion=-1), ast.Constant(value=":"), ast.FormattedValue( value=ast.IfExp( test=ast.Name(id="test_class_name", ctx=ast.Load()), body=ast.BinOp( left=ast.Name(id="test_class_name", ctx=ast.Load()), op=ast.Add(), right=ast.Constant(value="."), ), orelse=ast.Constant(value=""), ), conversion=-1, ), ast.FormattedValue(value=ast.Name(id="test_name", ctx=ast.Load()), conversion=-1), ast.Constant(value=":"), ast.FormattedValue(value=ast.Name(id="function_name", ctx=ast.Load()), conversion=-1), ast.Constant(value=":"), ast.FormattedValue(value=ast.Name(id="loop_index", ctx=ast.Load()), conversion=-1), ast.Constant(value=":"), ast.FormattedValue(value=ast.Name(id="invocation_id", ctx=ast.Load()), conversion=-1), ] ), lineno=lineno + 9, ), Performance overhead* Printing detailed stdout tags for each invocation adds extra I/O; validate that performance impact is acceptable for large test suites. codeflash_wrap.index[test_id] = 0 codeflash_test_index = codeflash_wrap.index[test_id] invocation_id = f'{{line_id}}_{{codeflash_test_index}}' test_stdout_tag = f"{{test_module_name}}:{{(test_class_name + '.' if test_class_name else '')}}{{test_name}}:{{function_name}}:{{loop_index}}:{{invocation_id}}" print(f"!$######{{test_stdout_tag}}######$!")

github-actions · 2025-06-03T08:17:56Z

PR Code Suggestions ✨

Latest suggestions up to 1070815
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix missing end markers check Replace the redundant `or not begin_matches` check with a check against missing end markers so that invocations are skipped only when either start or end markers are absent. codeflash/verification/parse_test_output.py [277] -if not begin_matches or not begin_matches: +if not begin_matches or not end_matches: Suggestion importance[1-10]: 8 __ Why: The condition `if not begin_matches or not begin_matches` redundantly checks `begin_matches` twice; replacing the second check with `or not end_matches` correctly skips when end markers are missing, avoiding parsing errors.	Medium
General	Use positional matching for end markers Avoid using the groups tuple as a dict key since multiple invocations may share the same tag and overwrite entries. Instead, collect end matches in a list and select the next appropriate end marker by comparing positions. codeflash/verification/parse_test_output.py [269-275] -end_matches = {} -for match in matches_re_end.finditer(sys_stdout): +end_matches = list(matches_re_end.finditer(sys_stdout)) + +for match_index, match in enumerate(begin_matches): groups = match.groups() - if len(groups[5].split(":")) > 1: - iteration_id = groups[5].split(":")[0] - groups = groups[:5] + (iteration_id,) - end_matches[groups] = match + # find the first end match after this start + end_match = next((em for em in end_matches if em.start() > match.end()), None) + iteration_id, runtime = groups[5], None + if end_match: + stdout = sys_stdout[match.end():end_match.start()] + # parse iteration_id and runtime as before... Suggestion importance[1-10]: 7 __ Why: Using the tuple of groups as a dict key can lead to overwrites when tags repeat; collecting end matches in a list and matching by position ensures each start marker pairs with the correct end marker and preserves all entries.	Medium
General	Anchor end regex to newline Anchor the end‐marker regex to include the trailing newline (as done for the start regex) to avoid capturing unintended trailing characters. codeflash/verification/parse_test_output.py [40] -matches_re_end = re.compile(r"!######(.?):(.?)([^\.:]?):(.?):(.?):(.?)######!") +matches_re_end = re.compile(r"!######(.?):(.?)([^\.:]?):(.?):(.?):(.?)######!\n") Suggestion importance[1-10]: 5 __ Why: Adding a `\n` anchor to the end‐marker regex aligns it with the start pattern and prevents stray characters from being captured, improving match precision without major side effects.	Low

Previous suggestions

Suggestions up to commit bc47662

Category	Suggestion	Impact
Possible issue	Fix missing end-match check The condition mistakenly repeats `not begin_matches` instead of checking for missing end markers. Change the second part to `not end_matches` to detect incomplete capture pairs. codeflash/verification/parse_test_output.py [277] -if not begin_matches or not begin_matches: +if not begin_matches or not end_matches: test_results.add( FunctionTestInvocation( ..., stdout="", ) ) Suggestion importance[1-10]: 9 __ Why: The duplicated `not begin_matches` condition skips checking `end_matches`, breaking the pair matching logic; correcting it addresses a real bug in incomplete capture detection.	High
General	Include newline in end regex The end-marker regex doesn’t account for the newline that `print` appends, causing slicing offsets to be off by one. Include `\n` after `######!` so the match spans the full marker. codeflash/verification/parse_test_output.py [40] -matches_re_end = re.compile(r"!######(.?):(.?)([^\.:]?):(.?):(.?):(.?)######!") +matches_re_end = re.compile(r"!######(.?):(.?)([^\.:]?):(.?):(.?):(.?)######!\n") Suggestion importance[1-10]: 6 __ Why: Incorporating `\n` into `matches_re_end` ensures the trailing newline from `print` is matched, preventing off-by-one slicing errors in extracted `stdout`.	Low

misrasaurabh1 · 2025-06-03T09:10:30Z

will fix some remaining tests soon. The changes should be ready to review now though. I want to deploy this today

github-actions · 2025-06-03T09:11:49Z

Persistent review updated to latest commit 1070815

openhands-ai · 2025-06-03T23:35:02Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- end-to-end-test

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #273

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

misrasaurabh1 added 6 commits May 30, 2025 20:03

first version of stdout capture improvement

f6fd450

progress

6dd2fee

Merge branch 'refs/heads/main' into better-stdout-capture

26f48b6

seems to be working

8b2f948

Fix some bugs and tests

22c62ca

remove spare comments

bc47662

github-actions bot added the Review effort 4/5 label Jun 3, 2025

misrasaurabh1 added 3 commits June 3, 2025 01:27

fix more tests

80baa78

fix more tests

29b2f21

make ruff happy

1070815

misrasaurabh1 requested a review from a team June 3, 2025 09:10

misrasaurabh1 marked this pull request as ready for review June 3, 2025 09:10

misrasaurabh1 added 6 commits June 3, 2025 13:42

try one fix attempt

38f1ab5

fix one test

3a6e481

fix one test

537ca01

fix one more test

3cd3f0a

fix one more test

6c00fb3

fix one more test

0e5f79f

KRRT7 approved these changes Jun 4, 2025

View reviewed changes

misrasaurabh1 merged commit 1ee8c8a into main Jun 4, 2025
15 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More precise stdout capture #273

More precise stdout capture #273

Uh oh!

misrasaurabh1 commented Jun 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 3, 2025 •

edited

Loading

Uh oh!

misrasaurabh1 commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

openhands-ai bot commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More precise stdout capture #273

More precise stdout capture #273

Uh oh!

Conversation

misrasaurabh1 commented Jun 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 1070815)

Uh oh!

github-actions bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

misrasaurabh1 commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

openhands-ai bot commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

misrasaurabh1 commented Jun 3, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jun 3, 2025 •

edited

Loading

(Review updated until commit `1070815`)

github-actions bot commented Jun 3, 2025 •

edited

Loading