-
Notifications
You must be signed in to change notification settings - Fork 0
Create GHA to run the claude-code-harness #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tgrunnagle
wants to merge
11
commits into
main
Choose a base branch
from
claude-code-harness-gha_2025-10-30
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+105
−0
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
4dfce2f
Create GHA to run the claude-code-harness
tgrunnagle 22dabae
Address claude PR feedback
tgrunnagle 159f7e5
Add trigger to run the action for testing before merge
tgrunnagle 953feb0
Specify claude-code-harness branch
tgrunnagle 3a408b2
Export ANTHROPIC_API_KEY to run the harness
tgrunnagle 98d1485
Add 'Install Claude CLI' step
tgrunnagle 48adf71
Test on claude-code-harness feature branch
tgrunnagle c84bc07
Upload mcp-optimizer logs to action artifacts
tgrunnagle 6f80522
Persist servers to get logs
tgrunnagle b4ead84
Start thv server as part of action
tgrunnagle b18dfc4
Add artifact for `thv list` query
tgrunnagle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # Runs the claude-code-test-harness to gather metrics on performance with Claude Code. | ||
| name: Claude Code Test Harness | ||
|
|
||
| on: | ||
| workflow_call: | ||
| workflow_dispatch: | ||
| pull_request: | ||
| # TODO: remove this trigger once the workflow is well tested | ||
| types: [synchronize] | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| claude-code-test-harness: | ||
| name: Claude Code Test Harness | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 20 | ||
|
|
||
| steps: | ||
| # pull and build mcp-optimizer for deployment in the test harness | ||
| - name: Checkout mcp-optimizer code | ||
| uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 | ||
|
|
||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1 | ||
|
|
||
| - name: Build mcp-optimizer Docker image | ||
| uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 | ||
| with: | ||
| context: . | ||
| platforms: linux/amd64 | ||
| push: false | ||
| load: true | ||
| cache-from: type=gha | ||
| tags: mcp-optimizer:latest | ||
|
|
||
| # install dependencies | ||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41 # v7.1.2 | ||
| with: | ||
| enable-cache: true | ||
| python-version: '3.13' | ||
|
|
||
| - name: Install ToolHive | ||
| uses: StacklokLabs/toolhive-actions/install@6a095f99aa2fd6cd92cf0bb94bdf509b99820c06 # v0.0.3 | ||
|
|
||
| - name: Install Claude CLI | ||
| run: | | ||
| npm install -g @anthropic-ai/claude-code | ||
|
|
||
| # Start toolhive server | ||
| - name: Run ToolHive server | ||
| run: | | ||
| thv serve --host 0.0.0.0 --port 9090 & | ||
| echo "Waiting for ToolHive server to start..." | ||
| sleep 5 | ||
| echo "Checking ToolHive API is accessible..." | ||
| curl -v http://localhost:9090/api/v1beta/version || echo "Failed to connect to ToolHive API" | ||
| echo "Checking from Docker bridge IP..." | ||
| curl -v http://172.17.0.1:9090/api/v1beta/version || echo "Failed to connect via Docker bridge IP" | ||
|
|
||
| # pull the claude-code-harness code | ||
| - name: Checkout claude-code-harness code | ||
| uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 | ||
| with: | ||
| repository: StacklokLabs/claude-code-harness | ||
| ref: wait-for-running_2025-10-30 | ||
| # PAT with read-only access to the claude-code-harness repo | ||
| token: ${{ secrets.GHA_CLAUDE_CODE_HARNESS_READ_PAT }} | ||
| path: claude-code-harness | ||
|
|
||
| # Run the test harness, capture mcp-optimizer server logs | ||
| - name: Run Claude Code Test Harness | ||
| run: | | ||
| cd claude-code-harness | ||
| export ANTHROPIC_API_KEY="${{ secrets.ANTHROPIC_API_KEY }}" | ||
| uv run python -m src ./configs/test/gha.json --setup ./configs/test/gha_server_setup.json --persist-servers | ||
| thv logs mcp-optimizer > ./mcp-optimizer-server.log || echo "Failed to get mcp-optimizer logs" | ||
| thv list --format json > ./thv-list.json || echo "Failed to list thv servers" | ||
| continue-on-error: true | ||
|
|
||
| # Upload the results as an artifact | ||
| - name: Upload Test Harness Run Logs | ||
| uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: claude-code-harness-logs | ||
| path: claude-code-harness/logs/*.jsonl | ||
| if-no-files-found: warn | ||
|
|
||
| # upload mcp-optimizer server logs | ||
| - name: Upload mcp-optimizer Server Logs | ||
| uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: mcp-optimizer-server-logs | ||
| path: ./claude-code-harness/mcp-optimizer-server.log | ||
| if-no-files-found: warn | ||
|
|
||
| # upload thv list output | ||
| - name: Upload thv list output | ||
| uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: thv-list | ||
| path: ./claude-code-harness/thv-list.json | ||
| if-no-files-found: warn | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to call this workflow from
code-checks.yml