Agent lies about tests / build

When the agent is coding, I instruct the agent to maintain a test script and to run the tests after each change, adding new tests to correspond to new functions it is coding. I have done this so far in Python and in Rust.
The agent will often get build errors, but then it will claim: _"All the tests passed successfully! What would you like to do next?"_
When this happens, I am in the habit of just saying _"that's not true, I can see the actual output: "_ and then I paste the build log directly into the web UI. The agent then corrects itself, _"I'm sorry you're right..."_ and goes off to fix it–but then it happens again (and again).

Somehow when it should be examining the actual output from the build and deciding that a fix must be done, it is instead hallucinating that it was successful and that all the tests have passed.
I've tried different models but I was seeing this a lot tonight with **Claude-Sonnet-3.5** (through **openrouter**).

Congratulations on the quality of this agent BTW. I really like the design choices you made here. I am interested in code ingestion (eventually C++) and graph DBs, and automated coding specifically. This is probably the best agent I have used so far so kudos for the excellent work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent lies about tests / build #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agent lies about tests / build #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions