|
| 1 | +# ADR: Provisioning Strategy - Minimal Cloud-init + Ansible |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +**Proposed** - Based on comprehensive analysis of current PoC limitations and production requirements |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +The current PoC uses a cloud-init + shell script approach for VM provisioning and application |
| 10 | +deployment. While this approach works for demonstration purposes, it presents significant |
| 11 | +challenges for production use and testing automation: |
| 12 | + |
| 13 | +### Current Approach Limitations |
| 14 | + |
| 15 | +**Cloud-init Heavy Approach**: |
| 16 | + |
| 17 | +- Complex debugging when provisioning fails |
| 18 | +- Limited conditional logic capabilities |
| 19 | +- Difficult to test without full VM lifecycle |
| 20 | +- Shell script brittleness and maintenance overhead |
| 21 | +- Poor CI/CD integration due to VM dependencies |
| 22 | + |
| 23 | +**Testing Challenges**: |
| 24 | + |
| 25 | +- 8-12 minute test cycles including VM provisioning |
| 26 | +- Requires KVM/libvirt support for testing |
| 27 | +- Standard CI runners don't support nested virtualization |
| 28 | +- Infrastructure failures obscure application issues |
| 29 | +- High resource requirements (CPU, memory, storage) |
| 30 | + |
| 31 | +**Technology Stack Complexity**: |
| 32 | + |
| 33 | +- 4-technology stack: Terraform + Cloud-init + Docker + Shell scripts |
| 34 | +- Complex orchestration between different tooling approaches |
| 35 | +- Inconsistent error handling and logging across tools |
| 36 | + |
| 37 | +## Decision |
| 38 | + |
| 39 | +**Adopt a minimal cloud-init + Ansible hybrid approach** for the production redesign: |
| 40 | + |
| 41 | +### Cloud-init Role (Minimal) |
| 42 | + |
| 43 | +Cloud-init will handle only essential system initialization: |
| 44 | + |
| 45 | +- Basic system setup (users, SSH keys, network) |
| 46 | +- Package manager configuration and essential packages |
| 47 | +- Docker installation and daemon configuration |
| 48 | +- Security configuration (firewall, fail2ban, SSH hardening) |
| 49 | +- Ansible prerequisites (Python, pip, ansible-core) |
| 50 | + |
| 51 | +### Ansible Role (Primary) |
| 52 | + |
| 53 | +Ansible will handle all application-level configuration and deployment: |
| 54 | + |
| 55 | +- Application configuration management |
| 56 | +- Service deployment and orchestration |
| 57 | +- Health checks and validation |
| 58 | +- Environment-specific customization |
| 59 | +- Operational procedures (backups, monitoring, updates) |
| 60 | + |
| 61 | +### Technology Stack Simplification |
| 62 | + |
| 63 | +**Target Stack**: |
| 64 | + |
| 65 | +- **Infrastructure**: OpenTofu/Terraform |
| 66 | +- **Configuration Management**: Ansible |
| 67 | +- **Services**: Docker Compose |
| 68 | +- **Testing**: Container-first with minimal VM validation |
| 69 | + |
| 70 | +## Rationale |
| 71 | + |
| 72 | +### 1. Improved Testability |
| 73 | + |
| 74 | +**Container-Based Testing**: Ansible playbooks can be tested using molecule with Docker driver, |
| 75 | +eliminating VM dependencies for most test scenarios: |
| 76 | + |
| 77 | +- **Speed**: Container startup in seconds vs. minutes for VMs |
| 78 | +- **CI/CD Native**: Standard CI platforms support Docker containers |
| 79 | +- **Resource Efficiency**: Lower CPU, memory, and storage requirements |
| 80 | +- **Debugging**: Direct access to application logs and state |
| 81 | + |
| 82 | +### 2. Enhanced Maintainability |
| 83 | + |
| 84 | +**Declarative Configuration**: Ansible's YAML-based declarative syntax is more maintainable |
| 85 | +than shell scripts: |
| 86 | + |
| 87 | +- Clear, readable configuration management |
| 88 | +- Built-in idempotency guarantees |
| 89 | +- Comprehensive error handling and logging |
| 90 | +- Large ecosystem of community modules |
| 91 | + |
| 92 | +### 3. Production Readiness |
| 93 | + |
| 94 | +**Operational Excellence**: Ansible provides production-grade capabilities: |
| 95 | + |
| 96 | +- Role-based organization for reusability |
| 97 | +- Inventory management for multi-environment deployments |
| 98 | +- Vault integration for secret management |
| 99 | +- Comprehensive logging and audit trails |
| 100 | + |
| 101 | +### 4. CI/CD Compatibility |
| 102 | + |
| 103 | +**Testing Strategy**: Container-first approach enables efficient CI/CD pipelines: |
| 104 | + |
| 105 | +- Unit tests: Individual components in containers (seconds) |
| 106 | +- Integration tests: Multi-service Docker Compose (1-3 minutes) |
| 107 | +- E2E tests: Reserved for critical scenarios with real infrastructure (5-10 minutes) |
| 108 | + |
| 109 | +## Implementation Strategy |
| 110 | + |
| 111 | +### Phase 1: Core Infrastructure |
| 112 | + |
| 113 | +1. **Minimal Cloud-init Templates**: Create lean cloud-init configurations focused on system initialization |
| 114 | +2. **Ansible Playbook Structure**: Develop role-based playbooks for application deployment |
| 115 | +3. **Container Testing**: Implement molecule-based testing for Ansible roles |
| 116 | + |
| 117 | +### Phase 2: Application Integration |
| 118 | + |
| 119 | +1. **Service Orchestration**: Migrate Docker Compose management to Ansible |
| 120 | +2. **Configuration Management**: Replace envsubst templating with Ansible Jinja2 |
| 121 | +3. **Health Checks**: Implement comprehensive service validation |
| 122 | + |
| 123 | +### Phase 3: Testing and Validation |
| 124 | + |
| 125 | +1. **Container Test Suite**: Comprehensive Docker-based testing |
| 126 | +2. **Integration Validation**: Multi-service container testing |
| 127 | +3. **Minimal E2E**: Strategic VM testing for infrastructure validation |
| 128 | + |
| 129 | +## Consequences |
| 130 | + |
| 131 | +### Positive |
| 132 | + |
| 133 | +- **Faster Development Cycles**: Container-based testing reduces feedback loops |
| 134 | +- **Better CI/CD Integration**: Standard CI platforms support Docker natively |
| 135 | +- **Improved Debugging**: Clear error messages and logging from Ansible |
| 136 | +- **Enhanced Maintainability**: Declarative configuration over imperative scripts |
| 137 | +- **Production Readiness**: Industry-standard configuration management practices |
| 138 | +- **Reduced Complexity**: 3-technology stack vs. current 4-technology approach |
| 139 | + |
| 140 | +### Negative |
| 141 | + |
| 142 | +- **Learning Curve**: Team needs Ansible expertise |
| 143 | +- **Migration Effort**: Requires refactoring existing shell script logic |
| 144 | +- **Initial Complexity**: Setting up molecule testing framework |
| 145 | + |
| 146 | +### Risks and Mitigation |
| 147 | + |
| 148 | +**Risk**: Ansible playbook complexity could become unwieldy |
| 149 | +**Mitigation**: Use role-based organization and follow Ansible best practices |
| 150 | + |
| 151 | +**Risk**: Container testing might miss infrastructure-specific issues |
| 152 | +**Mitigation**: Maintain strategic E2E testing for critical infrastructure scenarios |
| 153 | + |
| 154 | +## Alternative Approaches Considered |
| 155 | + |
| 156 | +### 1. Pure Cloud-init Approach |
| 157 | + |
| 158 | +**Rejected**: Maintains testing challenges and limited flexibility for complex logic |
| 159 | + |
| 160 | +### 2. Ansible-Only (No Cloud-init) |
| 161 | + |
| 162 | +**Rejected**: Requires more complex initial connectivity setup and provider-specific handling |
| 163 | + |
| 164 | +### 3. Shell Script Enhancement |
| 165 | + |
| 166 | +**Rejected**: Doesn't address fundamental testing and maintainability issues |
| 167 | + |
| 168 | +## References |
| 169 | + |
| 170 | +- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html) |
| 171 | +- [Molecule Testing Framework](https://molecule.readthedocs.io/) |
| 172 | +- [Testcontainers Documentation](https://www.testcontainers.org/) |
| 173 | +- [Docker Compose Testing Strategies](https://docs.docker.com/compose/) |
| 174 | + |
| 175 | +## Related Decisions |
| 176 | + |
| 177 | +- **Testing Strategy**: Three-layer architecture with container-first approach |
| 178 | +- **Configuration Management**: Ansible Jinja2 templating over envsubst |
| 179 | +- **Technology Stack**: Simplified 3-component architecture |
0 commit comments