Add VPS deployment monitoring scripts and documentation #26

developerisnow · 2025-11-12T07:17:11Z

Problem

Strongly recommended start from reading issues #25
When running CodeMachine on remote servers (VPS, SSH), users encounter two issues:

Process termination on SSH disconnect
UI disappearance after reconnect (confusing - looks like crash)

These issues make long-running workflows on remote servers problematic.

Solution

This PR adds pragmatic deployment utilities with clear observability:

📦 What's Included

Monitoring Scripts (scripts/deployment/):

codemachine-persistent.sh - Launch with SSH disconnect survival (nohup)
codemachine-status.sh - Enhanced monitoring showing:
- Running/completed/failed agents
- Recent log activity
- Process status and uptime
codemachine-stop.sh - Graceful shutdown

Documentation:

docs/vps-deployment.md - Complete deployment guide
scripts/deployment/README.md - Script usage and troubleshooting
README.md - Quick start section

✅ Key Features

SSH Disconnect Survival: Process continues running using nohup
Clear Status Visibility: Know if workflow is actually working
Understanding UI Detachment: Documentation explaining this is expected (Ink framework limitation)
Production-Ready: Generic scripts, no hardcoded paths

🎯 Value Proposition

Before:

Users think CodeMachine crashed (UI gone = broken?)
No way to verify process is working
Manual nohup setup needed

After:

Clear status: "Process running, UI detached (expected)"
One command to check: ./scripts/deployment/codemachine-status.sh
Ready-to-use scripts with documentation

📊 Testing

Tested on Ubuntu 20.04 VPS:

✅ Process survives SSH disconnect (60+ minutes)
✅ Status script shows accurate agent progress
✅ Logs update correctly in headless mode
✅ Graceful shutdown works

🔗 Related

Closes #25

📝 Technical Notes

Why UI detaches:

CodeMachine uses Ink (React for CLI)
Ink requires persistent TTY connection
SSH reconnect = new TTY = UI can't reconnect
This is architectural limitation, not a bug

Our approach:

Don't fix UI (would require major refactoring)
Provide observability tools
Document expected behavior clearly

🚀 Usage Example

# Start persistently
./scripts/deployment/codemachine-persistent.sh

# Close laptop, go to coffee shop, reconnect

# Check status
./scripts/deployment/codemachine-status.sh

Output:

✅ Process: RUNNING (PID: 123456)
   Uptime: 02:34:56

📊 Agent Status:
   🔄 code-generation-step-9: Code Generation Agent
   ✅ context-manager-step-8: Context Manager Agent

📝 Recently Updated Files (last 5):
   Nov 12 07:04: persistent-output.log
   Nov 12 07:04: agent-67-context-manager.log

Impact: Makes CodeMachine viable for VPS/remote deployment with clear observability.

Approach: Pragmatic - provide tools and documentation, not architectural changes.

updating definition

…d executions The profile parameter was redundant as the agentId is sufficient for authentication purposes. This simplifies the API surface and reduces unnecessary configuration options across all engine providers (Claude, Cursor, Codex) and their execution flows. Fixing json codex starting bug "can't find 'git-commit' profile"

…chitecture output paths - Split tracking functions into template.ts and steps.ts for better organization - Add new functions for tracking not completed steps - Update all references from .codemachine/plan/ to .codemachine/artifacts/ - Add new output format templates for architecture and plan generation - Implement glob pattern support in placeholder resolution

implement fallback agent execution when a step is marked as incomplete add notCompletedFallback field to workflow step types introduce fallback execution logic before original step runs

- Add plan-fallback and task-fallback agents to handle incomplete executions - Increase default timeout from 10 to 30 minutes across all providers - Refactor prompt placeholder processing into modular components - Implement structured task output with manifest files - Add support for optional placeholders with fallback behavior

Update placeholder configuration and template files to use consistent naming for task fallback files and variables. This ensures proper file resolution and variable interpolation in the task breakdown process.

Add markdown templates for codemachine workflows, agents, output formats, and fallback agents. These templates define the structure and behavior for various components in the codemachine system including task verification, git commit workflows, and agent-specific instructions. The templates provide consistent formatting and guidelines for: - Workflow definitions - Agent roles and responsibilities - Output format specifications - Fallback agent behavior

…uration - Standardize agent prompt file naming and paths - Add context manager agent to workflow - Update task verification workflow with detailed signaling - Consolidate output format placeholders - Adjust workflow loop steps for new context manager

refactor(validation): update task verification to use behavior.json docs: update agent prompts to use context and code_fallback placeholders build: add new placeholder paths for context and code_fallback files

…tructions - Clarify the issues detected section with more specific error details - Improve fix instructions with concrete action steps - Update success workflow to use task list file instead of direct JSON output

Enable resuming long workflows by starting from the last incomplete step tracked in .codemachine/template.json when resumeFromLastStep is true. Add getResumeStartIndex to compute the starting index based on notCompletedSteps. Update workflow runner to begin iteration from the computed index and log a resume notice. Expose the helper via shared workflows index. This improves reliability after interruptions without breaking existing behavior (feature disabled by default).

…ete step - Change resume behavior to use first incomplete step instead of last - Add default values for tracking fields in template and step functions - Add reasoning level validation and spinner logging to step command

- Bump package version to 0.3.0 - Replace 'plan' directory with 'artifacts' in workspace initialization - Add promptPath validation for fallback agents - Improve tool call handling in cursor engine runner - Update workflow templates to use cursor engine by default - Refactor loop behavior to use behavior.json instead of triggers - Update README with new installation instructions and features

- Convert synchronous loop handling functions to async/await pattern - Add detailed error logging for template tracking operations - Improve file path handling with path.join() for cross-platform compatibility - Enhance documentation for timestamp format in tracking interfaces - Update tests to handle async loop evaluation

- Add update-notifier package to check for updates daily - Include dim color in chalk palette for CLI styling - Update version from 0.1.0 to 0.3.0 - Enhance error messages with resume workflow instructions

- Add comprehensive CONTRIBUTING.md with contribution guidelines and workflow documentation

…ings - Rename architecture agent to standard naming convention - Streamline workflow engine configurations by removing redundant model specifications

Update README with improved formatting, updated statistics, and added comparison table Remove docs directory from gitignore as documentation is now tracked

Update documentation templates to reflect the new plan artifacts directory path from `.codemachine/artifacts/` to `.codemachine/artifacts/plan/` for consistency

…oopTrigger remove outdated engine authentication by removing profile-specific configuration Remove deprecated loopTrigger from module metadata and workflow configs

- Simplify start command to run workflow queue directly in headless mode

- improve DIM and update npm notifier - make CLI path in main menu dynamic

…tion - Introduce trigger behavior type and evaluation logic - Implement trigger controller and execution handling - Add iteration-checker module with trigger capability - Support dynamic agent triggering during workflow execution

…tion paths

The previous type assertion was incorrect and could cause runtime errors. Using NodeJS.EventEmitter ensures proper type safety when emitting events.

Add more specific patterns to exclude from version control including cache files, logs and temporary files

Update workflow steps to use 'cursor' engine with 'grok' model for plan-agent, task-breakdown, code-generation, runtime-prep, task-sanity-check, and git-commit steps. Change context-manager to use 'ccr' engine. Remove engine and fallback from plan-agent.

…ption Proper-lockfile requires the target file to exist. Added file creation logic and changed error handling to throw instead of degrade silently to prevent data corruption from concurrent access.

Replace async file operations with sync ones during registry file creation to ensure atomicity Add validation for empty/invalid registry files to handle edge cases

Switch engine assignments for various steps to improve consistency and reliability

- Add loopReason field to display task progress (e.g., "Cycle 2 - Task 5/50") - Update task verification workflow to report progress as "Task X/Y" - Pass loop reason from workflow execution to UI state - Fix bug where loopReason was cleared during agent reset

…d rules Expand the anchor insertion documentation with specific formatting requirements, placement rules, and workflow instructions. Add verification steps and example to ensure compliance.

- Increment index before accessing next argument in publish script for clarity - Replace tseslint.config with defineConfig for better type safety in eslint config

Integrate OpenCode CLI as a native engine provider alongside Claude, Codex, Cursor, and CCR. This enables users to run `codemachine opencode run` and use `--engine opencode` in workflow steps. Key features: - JSON event streaming with consistent log markers - Non-interactive execution via environment guardrails (OPENCODE_PERMISSION, OPENCODE_DISABLE_LSP_DOWNLOAD) - Registry-based provider pattern matching existing engines - Comprehensive documentation updates across README, CLI reference, and workflow guides 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Refactor auth system to use XDG-compliant paths under OPENCODE_HOME - Add support for multiple authentication providers in OpenCode CLI - Improve user guidance during auth flow with interactive prompts - Centralize all OpenCode data under ~/.codemachine/opencode - Update README to credit contributor for OpenCode integration

Add process group killing for Unix systems to handle Node.js wrapper scripts Improve logging and error handling for process termination Handle race conditions between abort signals and process events

feat: add OpenCode CLI as first-class engine integration

Move telemetry parsing logic into dedicated provider modules for better maintainability Improve telemetry capture by handling provider-specific formats consistently

Ensure environment variables from process.env are included when resolving runner environment to prevent missing variables

Update CCR engine name to 'Claude Code Router' and Claude engine name to 'Claude Code' to better reflect their purpose and functionality

Remove special logout handling for CCR in CLI command Replace .credentials.json with simple .enable marker file Update related auth functions to reflect new simpler approach

- Replace 'any' with 'unknown' and add proper type guards - Update eslint config to use tseslint.config - Rename CCR engine display name to 'Claude Code Router'

Add deployment utilities for running CodeMachine on remote servers: Scripts: - codemachine-persistent.sh: Launch with SSH disconnect survival (nohup) - codemachine-status.sh: Enhanced monitoring (agents, logs, errors) - codemachine-stop.sh: Graceful shutdown with cleanup Documentation: - docs/vps-deployment.md: Complete VPS deployment guide - scripts/deployment/README.md: Script usage and troubleshooting - README.md: Quick start section for remote servers Addresses common issues: - Process termination on SSH disconnect - UI detachment after reconnect (Ink framework limitation) - Monitoring headless workflows Closes moazbuilds#25

moazbuilds · 2025-11-13T20:38:39Z

Thanks for adding the VPS deployment scripts, We’re planning to move to Bun runtime and do some major adjustments, and once that transition is complete, we can revisit this PR as a strong foundation for long-running VPS deployments.

moazbuilds and others added 30 commits October 10, 2025 03:05

Update README.md

c10796e

updating definition

chore: bump version to 0.2.3

ad075cd

feat(workflows): add fallback agent support for incomplete steps

872f85b

implement fallback agent execution when a step is marked as incomplete add notCompletedFallback field to workflow step types introduce fallback execution logic before original step runs

fix: correct task_fallback path and template variable names

49af87a

Update placeholder configuration and template files to use consistent naming for task fallback files and variables. This ensures proper file resolution and variable interpolation in the task breakdown process.

feat(loop): make trigger optional and control behavior via config file

2810377

refactor(validation): update task verification to use behavior.json docs: update agent prompts to use context and code_fallback placeholders build: add new placeholder paths for context and code_fallback files

feat(cli): add update notifier and dim color to palette

39358ae

- Add update-notifier package to check for updates daily - Include dim color in chalk palette for CLI styling - Update version from 0.1.0 to 0.3.0 - Enhance error messages with resume workflow instructions

Create LICENSE

7bf8ee9

docs: add contributing guide

fee1cf4

- Add comprehensive CONTRIBUTING.md with contribution guidelines and workflow documentation

refactor(codemachine): simplify agent config and workflow engine sett…

0c922b5

…ings - Rename architecture agent to standard naming convention - Streamline workflow engine configurations by removing redundant model specifications

Merge branch 'main' of https://github.com/Mo33aazz/codemachine

6986d41

docs: update readme content and remove docs from gitignore

4e1535f

Update README with improved formatting, updated statistics, and added comparison table Remove docs directory from gitignore as documentation is now tracked

docs: improve readability of README section header

21558b7

docs: update plan artifacts directory path in templates

a5089b6

Update documentation templates to reflect the new plan artifacts directory path from `.codemachine/artifacts/` to `.codemachine/artifacts/plan/` for consistency

chore: bump version to 0.3.1

2446926

refactor(engines): remove outdated profile-based authentication and l…

94964e8

…oopTrigger remove outdated engine authentication by removing profile-specific configuration Remove deprecated loopTrigger from module metadata and workflow configs

refactor(workflows): remove force option from workflow execution

5269fc7

- Simplify start command to run workflow queue directly in headless mode

feat(cli): add --spec flag for global CLI usage

57a41f4

- improve DIM and update npm notifier - make CLI path in main menu dynamic

refactor(workspace): rename fallback directory to prompt for clarity

79c9586

refactor(memory): change memory system to write-only across all execu…

106d00e

…tion paths

moazbuilds and others added 24 commits November 5, 2025 18:47

fix(monitoring): use correct type for process.emit in cleanup

8828b34

The previous type assertion was incorrect and could cause runtime errors. Using NodeJS.EventEmitter ensures proper type safety when emitting events.

refactor(sub-agents): remove redundant engine field from agent configs

3806191

docs: update git-commit-workflow.md with additional gitignore rules

afb1284

Add more specific patterns to exclude from version control including cache files, logs and temporary files

fix(registryLock): ensure file exists before locking to prevent corru…

aa49bdc

…ption Proper-lockfile requires the target file to exist. Added file creation logic and changed error handling to throw instead of degrade silently to prevent data corruption from concurrent access.

fix(registry): prevent race conditions in registry file handling

34f65e5

Replace async file operations with sync ones during registry file creation to ensure atomicity Add validation for empty/invalid registry files to handle edge cases

chore: bump version to 0.4.2

7c402cd

refactor(workflow): update engine configurations in codemachine workflow

c052a43

Switch engine assignments for various steps to improve consistency and reliability

feat(claude): add analyzing status message on session init…

3559da7

chore: bump version to 0.4.3

860e9a2

docs(anchor-insertion): update anchor insertion protocol with detaile…

1063df6

…d rules Expand the anchor insertion documentation with specific formatting requirements, placement rules, and workflow instructions. Add verification steps and example to ensure compliance.

refactor: improve argument handling and update eslint config

a2d26e4

- Increment index before accessing next argument in publish script for clarity - Replace tseslint.config with defineConfig for better type safety in eslint config

fix(process): ensure proper cleanup of child processes on abort

8d419e6

Add process group killing for Unix systems to handle Node.js wrapper scripts Improve logging and error handling for process termination Handle race conditions between abort signals and process events

Merge pull request moazbuilds#23 from TheMightyDman/main

26f53e6

feat: add OpenCode CLI as first-class engine integration

refactor(telemetry): implement provider-specific telemetry parsers

61af6b4

Move telemetry parsing logic into dedicated provider modules for better maintainability Improve telemetry capture by handling provider-specific formats consistently

fix(opencode): merge process.env with provided env in resolveRunnerEnv

ee6653d

Ensure environment variables from process.env are included when resolving runner environment to prevent missing variables

refactor(engines): update engine names to be more descriptive

dbbb010

Update CCR engine name to 'Claude Code Router' and Claude engine name to 'Claude Code' to better reflect their purpose and functionality

refactor(ccr): simplify ccr auth handling by using .enable marker

230ceaa

Remove special logout handling for CCR in CLI command Replace .credentials.json with simple .enable marker file Update related auth functions to reflect new simpler approach

refactor(telemetry): improve type safety in telemetry parsers

b032d4c

- Replace 'any' with 'unknown' and add proper type guards - Update eslint config to use tseslint.config - Rename CCR engine display name to 'Claude Code Router'

chore: bump version to 0.5.0

b77062d

developerisnow mentioned this pull request Nov 12, 2025

codemachine crashes on SSH disconnect on Ubuntu VPS #25

Closed

moazbuilds self-requested a review November 13, 2025 20:23

moazbuilds removed their request for review November 13, 2025 20:38

moazbuilds added the enhancement New feature or request label Nov 18, 2025

moazbuilds force-pushed the main branch from 657ec85 to 0646554 Compare November 19, 2025 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add VPS deployment monitoring scripts and documentation #26

Add VPS deployment monitoring scripts and documentation #26

Uh oh!

developerisnow commented Nov 12, 2025 •

edited

Loading

Uh oh!

moazbuilds commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add VPS deployment monitoring scripts and documentation #26

Are you sure you want to change the base?

Add VPS deployment monitoring scripts and documentation #26

Uh oh!

Conversation

developerisnow commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

📦 What's Included

✅ Key Features

🎯 Value Proposition

📊 Testing

🔗 Related

📝 Technical Notes

🚀 Usage Example

Uh oh!

moazbuilds commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

developerisnow commented Nov 12, 2025 •

edited

Loading