Skip to content

Agent lies about tests / build #43

Open
@FellowTraveler

Description

@FellowTraveler

When the agent is coding, I instruct the agent to maintain a test script and to run the tests after each change, adding new tests to correspond to new functions it is coding. I have done this so far in Python and in Rust.
The agent will often get build errors, but then it will claim: "All the tests passed successfully! What would you like to do next?"
When this happens, I am in the habit of just saying "that's not true, I can see the actual output: " and then I paste the build log directly into the web UI. The agent then corrects itself, "I'm sorry you're right..." and goes off to fix it–but then it happens again (and again).

Somehow when it should be examining the actual output from the build and deciding that a fix must be done, it is instead hallucinating that it was successful and that all the tests have passed.
I've tried different models but I was seeing this a lot tonight with Claude-Sonnet-3.5 (through openrouter).

Congratulations on the quality of this agent BTW. I really like the design choices you made here. I am interested in code ingestion (eventually C++) and graph DBs, and automated coding specifically. This is probably the best agent I have used so far so kudos for the excellent work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions