-
Notifications
You must be signed in to change notification settings - Fork 61
Security analyzer framework #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| default="UNKNOWN", description=SECURITY_RISK_DESC | ||
| security_risk: SecurityRisk = Field( | ||
| default=SecurityRisk.UNKNOWN, | ||
| description="The LLM's assessment of the safety risk of this action.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not need to be done in this PR, but i wonder if there's way to dynamically add this field to action schema only when LLM analyzer is enabled?
BUT this probably means more dark magin for ActionSchema 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little scared about the dynamic generation tbh. We can go there later if it's needed though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 already created an issue here to track
#300
it'll be good if we can have a better mechanism to keep track of all these :(
- Reduced from 260 lines to 74 lines (71% reduction) - Removed verbose custom confirmation handling - Eliminated repetitive test scenarios - Focused on essential functionality: creating and using LLMSecurityAnalyzer - Follows the same concise pattern as other examples - Still demonstrates key concepts: automatic security risk evaluation Co-authored-by: openhands <[email protected]>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
Holding off on separate |
|
@xingyaoww and @malhotra5 Fixed up all the changes if you want to re-review. Will probably merge in the morning if there aren't any outstanding issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few nits -- happy to get this merged once it is resolved
| risk, state.confirmation_mode | ||
| ): | ||
| state.agent_status = AgentExecutionStatus.WAITING_FOR_CONFIRMATION | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malhotra5 actually, related to this, it just occurs to me if we can implement the "confirmation mode" as a different security analyzer that require confirmation for EVERY action?
always require confirmation & llm risk confirmation seems to be two things that are orthogonal that can actually be broken into two analyzers, and confirmation_mode can essentially be thrown away -- all we need to check is whether security_analyzer is None or not. Will it help eliminate a lot of cases in CLI for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ we can merge this PR -- this is non-blocking
|
Agent Server image for this PR Pull (multi-arch manifest): docker pull ghcr.io/all-hands-ai/agent-server:726cea1Run: docker run -it --rm \
-p 8000:8000 \
--name agent-server-726cea1 \
ghcr.io/all-hands-ai/agent-server:726cea1This tag is a multi-arch manifest (amd64/arm64). Your client pulls the right arch automatically. |
This PR addresses #59 by porting over a version of the security analyzer framework.
The main change from v0 to v1 is where the security analyzer is being called. Previously, it hooked into the agent controller and intercepted pending actions. However, since the agent is now responsible for executing the actions and will do so unless confirmation mode is set we now pass the security analyzer to the agent directly.