Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.

Commit b272f1b

Browse files
committed
docs: organize SSH bug documentation into structured archive
- Create infrastructure/docs/bugs/ directory for systematic bug documentation - Move SSH authentication failure documentation to 001-ssh-authentication-failure/ - Organize content into logical structure: - README.md: Bug overview and quick reference - SSH_BUG_ANALYSIS.md: Initial investigation and analysis - SSH_BUG_SUMMARY.md: Complete timeline and resolution - test-configs/: All 17 test configurations used during debugging - Add comprehensive README.md for bugs directory explaining: - Purpose and scope of bug documentation archive - Directory structure and naming conventions - Content guidelines and quality standards - Usage examples for contributors and maintainers - Fix markdown linting issues in all documentation files - Add markdownlint disable for technical content with long lines This establishes a systematic approach for documenting infrastructure bugs with complete investigation trails, test artifacts, and lessons learned. Future bugs can follow this template for consistent documentation quality.
1 parent c292adb commit b272f1b

20 files changed

+242
-0
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# SSH Authentication Failure Bug - #001
2+
3+
**Date Resolved:** July 4, 2025
4+
**Status:** ✅ Resolved
5+
**Impact:** High - Blocked VM access completely
6+
**Root Cause:** YAML document start marker (`---`) breaking cloud-init parsing
7+
8+
## Problem Summary
9+
10+
The full cloud-init configuration (`user-data.yaml.tpl`) for the Torrust Tracker
11+
Demo VM was causing SSH authentication failures for both SSH key and password
12+
authentication, preventing users from accessing deployed VMs.
13+
14+
## Root Cause
15+
16+
The issue was caused by using the YAML document start marker (`---`) at the
17+
beginning of the cloud-init configuration file instead of the required
18+
`#cloud-config` header. This caused cloud-init to misprocess the entire
19+
configuration, resulting in:
20+
21+
- Empty SSH authorized_keys (SSH key variable not templated)
22+
- Broken password authentication setup
23+
- Schema validation errors in cloud-init
24+
25+
## The Fix
26+
27+
**Simple but Critical Change:**
28+
29+
```yaml
30+
# BEFORE (BROKEN):
31+
---
32+
# cloud-config
33+
34+
# AFTER (FIXED):
35+
#cloud-config
36+
```
37+
38+
**File Changed:** `infrastructure/cloud-init/user-data.yaml.tpl`
39+
40+
## Investigation Process
41+
42+
This bug was resolved through systematic incremental testing:
43+
44+
1. **Incremental Testing**: Created 15+ test configurations, adding features one by one
45+
2. **Root Cause Isolation**: Compared working vs. broken configurations using diff analysis
46+
3. **Hypothesis Formation**: Identified YAML header as the key difference
47+
4. **Validation**: Deployed fresh VM with corrected header and confirmed fix
48+
49+
## Validation Results
50+
51+
After applying the fix:
52+
53+
- ✅ SSH Key Authentication: Works perfectly
54+
- ✅ Password Authentication: Works perfectly
55+
- ✅ All Cloud-Init Features: Docker, UFW, packages, etc. - ALL WORKING
56+
- ✅ Integration Tests: Complete test suite passes
57+
- ✅ Make Commands: Standard workflow (`make init`, `make plan`, `make apply`) works
58+
59+
## Files in This Directory
60+
61+
### Core Documentation
62+
63+
- `SSH_BUG_ANALYSIS.md` - Initial analysis and hypothesis formation
64+
- `SSH_BUG_SUMMARY.md` - Complete investigation summary with detailed timeline
65+
66+
### Test Artifacts
67+
68+
- `test-configs/` - All 16 test configurations used during incremental testing
69+
- `user-data-test-1.1.yaml.tpl` through `user-data-test-15.1.yaml.tpl`
70+
- `user-data-test-header.yaml.tpl` - Final test that confirmed the fix
71+
72+
### Validation
73+
74+
- `validation/` - (Currently empty, reserved for future validation scripts)
75+
76+
## Lessons Learned
77+
78+
1. **Cloud-init requires specific headers**: `#cloud-config` is mandatory, not `---`
79+
2. **Incremental testing is powerful**: Systematic approach isolated the issue effectively
80+
3. **Template variable validation**: Always verify that template variables are being substituted correctly
81+
4. **Integration testing is crucial**: End-to-end testing revealed the full scope of the issue
82+
83+
## Prevention
84+
85+
To prevent similar issues:
86+
87+
- Always use `#cloud-config` as the first line in cloud-init files
88+
- Test template variable substitution in terraform plans
89+
- Run integration tests after any cloud-init configuration changes
90+
- Use the documented make workflow for deployments
91+
92+
## Related Issues
93+
94+
This fix resolves SSH access problems that were preventing users from following
95+
the integration testing guide and deploying the Torrust Tracker Demo
96+
successfully.
97+
98+
## Technical Details
99+
100+
For complete technical details, debugging methodology, and step-by-step
101+
investigation process, see:
102+
103+
- [SSH_BUG_ANALYSIS.md](SSH_BUG_ANALYSIS.md) - Initial investigation
104+
- [SSH_BUG_SUMMARY.md](SSH_BUG_SUMMARY.md) - Comprehensive analysis with timeline

infrastructure/cloud-init/SSH_BUG_ANALYSIS.md renamed to infrastructure/docs/bugs/001-ssh-authentication-failure/SSH_BUG_ANALYSIS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
<!-- markdownlint-disable MD013 -->
2+
13
# SSH Authentication Bug Analysis - Cloud-Init Configuration
24

35
## Problem Summary

infrastructure/cloud-init/SSH_BUG_SUMMARY.md renamed to infrastructure/docs/bugs/001-ssh-authentication-failure/SSH_BUG_SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
<!-- markdownlint-disable MD013 -->
2+
13
# SSH Authentication Bug Analysis Summary
24

35
**Date:** July 4, 2025

0 commit comments

Comments
 (0)