Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.

Commit 309fc30

Browse files
committed
feat: [#31] add project redesign documentation
This commit introduces the complete project redesign documentation, covering Phase 0 (Goals), Phase 2 (PoC Analysis), and Phase 3 (New Design). It establishes the foundation for the greenfield implementation by defining project goals, analyzing the existing proof-of-concept, and specifying the new architecture. Key additions include: - Phase 0: Project Goals and Scope - Phase 2: Detailed analysis of the PoC's architecture, automation, configuration, testing, and documentation. - Phase 3: High-level design, component-level design, data models, and UX for the new implementation. This documentation provides a clear roadmap for the development of the new Torrust Tracker deployment solution, ensuring that lessons learned from the PoC are carried forward into a more robust, scalable, and maintainable product.
1 parent 51106dc commit 309fc30

16 files changed

+823
-8
lines changed

docs/redesign/phase0-goals/project-goals-and-scope.md

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,25 @@ the barrier to tracker adoption.**
6666
- **Not included**: Ongoing maintenance automation
6767
- **Alternative**: Users handle maintenance through standard system administration practices
6868

69-
### Dynamic Scaling
70-
71-
**Rationale**: Torrust tracker does not support horizontal scaling architecturally.
72-
73-
- **Not included**: Auto-scaling based on load
74-
- **Not included**: Multi-instance load balancing
75-
- **Not included**: Automatic migration to larger servers
76-
- **Alternative**: Manual migration by deploying to new infrastructure and migrating data
69+
### Dynamic Scaling and High Availability
70+
71+
**Rationale**: The installer is intentionally focused on a single-node deployment
72+
for two primary reasons:
73+
74+
1. **Application Architecture**: The Torrust tracker application itself does not
75+
natively support horizontal scaling. Peer data is managed in memory on a
76+
single instance, meaning that true high availability or load balancing would
77+
require significant changes to the core tracker application, which is beyond
78+
the scope of this installer project.
79+
2. **Target Audience**: The primary users are often hobbyists or small groups
80+
who require a simple, cost-effective, single-server deployment. The current
81+
architecture meets this need directly.
82+
83+
- **Not included**: Auto-scaling based on load.
84+
- **Not included**: Multi-instance load balancing or high-availability clusters.
85+
- **Not included**: Automatic migration to larger servers.
86+
- **Alternative**: Users can manually migrate to a more powerful server by
87+
provisioning new infrastructure and transferring their data.
7788

7889
### Migration Between Providers
7990

@@ -98,6 +109,32 @@ the barrier to tracker adoption.**
98109
**Rationale**: Provider-level resource isolation requires complex provider-specific
99110
implementation that varies significantly across cloud providers.
100111

112+
### Multi-User Deployment Management
113+
114+
**Rationale**: The project is designed for a single system administrator to perform a one-time
115+
deployment. It is not intended to be a multi-user platform for managing different
116+
environments.
117+
118+
- **Not included**: Remote state management for team collaboration (e.g., Terraform Cloud, S3 backend)
119+
- **Not included**: Role-based access control for infrastructure changes
120+
- **Not included**: Environment management for multiple users
121+
- **Alternative**: The system uses local state files, which is sufficient for the
122+
single-administrator use case. Disaster recovery relies on data and configuration backups,
123+
not on collaborative state management.
124+
125+
### Generic Infrastructure Abstraction Layer
126+
127+
**Rationale**: Building a custom abstraction layer to normalize infrastructure resources across
128+
different cloud providers (e.g., creating a generic "server" or "network" concept) is a
129+
significant engineering effort that replicates the core functionality of tools like OpenTofu
130+
and Terraform. The project's goal is to leverage these existing IaC tools, not to reinvent
131+
them.
132+
133+
- **Not included**: A custom, intermediate API or schema for defining infrastructure.
134+
- **Alternative**: Directly use provider-specific configurations within OpenTofu, mapping
135+
project needs to the native capabilities of each provider. This approach is more maintainable
136+
and aligns with industry best practices.
137+
101138
- **Not included**: Resource name prefixes for environment isolation
102139
- **Not included**: Private network creation for environment separation
103140
- **Not included**: Provider-specific isolation mechanisms (VPCs, resource groups, etc.)
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# High-Level Architecture Analysis
2+
3+
This document synthesizes the architectural analysis.
4+
5+
## Core Architectural Principles
6+
7+
The Torrust Tracker Demo project is a Proof of Concept (PoC) that successfully
8+
demonstrates a production-ready deployment of the Torrust Tracker. Its
9+
architecture is built on several strong, modern principles:
10+
11+
- **Twelve-Factor App Methodology**: The project adheres to the twelve-factor app principles,
12+
promoting portability, scalability, and clean deployment practices. There is a clear and
13+
well-executed distinction between the build, release, and run stages.
14+
- **Separation of Concerns**: There is an excellent separation between the `infrastructure` and
15+
`application` layers. This is a solid foundation that makes it easier to manage different
16+
parts of the system independently. The two-stage deployment process (`make infra-apply`
17+
followed by `make app-deploy`) is a direct and beneficial result of this separation.
18+
- **Infrastructure as Code (IaC)**: The use of OpenTofu/Terraform for infrastructure
19+
management is a modern and robust approach. It ensures that infrastructure is reproducible,
20+
version-controlled, and documented.
21+
- **Immutable Infrastructure Philosophy**: The design encourages treating infrastructure as
22+
immutable. VMs can be destroyed and recreated easily without manual intervention, which is a
23+
core tenet of modern cloud-native development.
24+
25+
## Key Architectural Layers
26+
27+
- **Infrastructure Layer (`/infrastructure`)**: Manages the provisioning of virtual
28+
machines (VMs) and underlying network resources using **OpenTofu/Terraform** and
29+
**cloud-init**. It is designed to be modular, with support for different providers
30+
(e.g., libvirt for local, Hetzner for cloud).
31+
- **Application Layer (`/application`)**: Contains the application services, which are
32+
orchestrated using **Docker Compose**. This includes the Torrust Tracker itself, a MySQL
33+
database, an Nginx reverse proxy, and monitoring tools like Prometheus and Grafana.
34+
- **Automation Layer (`Makefile`)**: A root `Makefile` serves as the primary, user-friendly
35+
entry point for all development and deployment tasks, orchestrating the complex scripts
36+
required for provisioning and deployment.
37+
38+
## Areas for Improvement
39+
40+
While the foundation is strong, several areas have been identified for improvement in the
41+
greenfield redesign:
42+
43+
- **Monolithic Repository**: The current repository contains the PoC code, extensive
44+
documentation, and the new redesign plans. This can be confusing for newcomers. The plan to
45+
split the new implementation into a separate, clean repository is a step in the right
46+
direction.
47+
- **Over-reliance on Shell Scripts**: The automation is heavily dependent on a large
48+
collection of bash scripts. While effective for a PoC, this approach can be brittle and
49+
hard to maintain for a production-grade system.
50+
- **Provider Configuration Strategy**: The system supports multiple providers, such as Libvirt
51+
for local development and Hetzner for cloud deployments, which can be used concurrently. The
52+
design avoids creating a custom, generic abstraction layer for infrastructure providers, as
53+
this would replicate the functionality already present in OpenTofu. Instead, the project's
54+
strategy is to directly map provider-specific characteristics (e.g., instance sizes,
55+
regions) to concrete OpenTofu configuration values. This approach leverages the power of the
56+
underlying IaC tool without adding unnecessary complexity.
57+
- **State Management**: The PoC uses local OpenTofu/Terraform state files. While this model
58+
does not support team collaboration, it aligns with the project's intended use case: a
59+
single system administrator performing an initial one-time deployment. For disaster
60+
recovery, the emphasis is on backing up application data and configurations, allowing for
61+
manual restoration, rather than on collaborative infrastructure management through remote
62+
state.
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Automation and Tooling Analysis
2+
3+
This document synthesizes the analysis of the automation and tooling.
4+
5+
## Strengths of the Current Automation
6+
7+
The project is heavily and effectively automated, which is a major strength for
8+
ensuring consistency and reproducibility.
9+
10+
- **Centralized Entry Point (`Makefile`)**: The root `Makefile` is an excellent feature,
11+
providing a simple and user-friendly interface for the entire project. Complex,
12+
multi-step workflows are simplified into single, memorable commands like `make dev-deploy`,
13+
`make test-e2e`, and `make lint`.
14+
- **Comprehensive Automation**: The PoC automates nearly the entire project lifecycle, from
15+
initial dependency installation (`make install-deps`) to infrastructure provisioning,
16+
application deployment, health checks, and resource cleanup.
17+
- **Well-Organized Shell Scripts**: The project uses a collection of well-organized,
18+
POSIX-compliant shell scripts located in `/scripts`, `/infrastructure/scripts`, and
19+
`/application/scripts`. These scripts handle the core logic for:
20+
- **Configuration Generation**: `configure-env.sh` and `configure-app.sh` process
21+
templates to create environment-specific configuration files.
22+
- **Deployment**: `provision-infrastructure.sh` and `deploy-app.sh` orchestrate the
23+
twelve-factor build, release, and run stages.
24+
- **Utilities**: `shell-utils.sh` provides a library of common functions for logging, error
25+
handling, and user-friendly sudo password management.
26+
- **Integrated Linting**: The project enforces strict code quality standards through a
27+
comprehensive linting script (`/scripts/lint.sh`). This script integrates multiple
28+
linters, providing a single command to validate the entire codebase:
29+
- `shellcheck` for shell scripts.
30+
- `yamllint` for YAML files.
31+
- `markdownlint` for documentation.
32+
- `tflint` for Terraform code.
33+
34+
## Weaknesses and Areas for Improvement
35+
36+
- **Over-reliance on Bash for Complex Logic**: The heavy use of bash for complex
37+
automation logic is a significant drawback. Bash scripts can be brittle, difficult to
38+
test, and hard to maintain as complexity grows. They lack the robust error handling,
39+
data structures, and testing frameworks available in higher-level languages.
40+
- **Lack of Idempotency in Some Scripts**: While the goal is idempotency, some scripts may
41+
not be fully idempotent. For example, running `app-deploy` multiple times could have
42+
unintended side effects if not carefully managed. A production-grade tool should
43+
guarantee the same result no matter how many times it is run.
44+
45+
## Recommendations for the Redesign
46+
47+
1. **Adopt a Higher-Level Language for Automation**: This is the most critical
48+
recommendation. The new installer should be written in a language like **Python**, **Go**,
49+
or **Rust**.
50+
- **Benefits**: This would provide superior error handling, mature testing frameworks,
51+
better dependency management, and access to official cloud provider SDKs. It would
52+
make the entire system more robust, maintainable, and easier to extend.
53+
- **Trade-offs**: While it might introduce a new language dependency for contributors, the
54+
long-term benefits for a project of this scale far outweigh this initial cost.
55+
2. **Use a Dedicated Configuration Tooling**: Instead of relying on `envsubst` and custom
56+
shell scripts for templating, the new system should adopt a more powerful and standard
57+
configuration management tool or a language-native templating engine, such as:
58+
- Jinja2 (if using Python).
59+
- Go's `text/template` package (if using Go).
60+
- Tools like Ansible for more complex configuration and orchestration tasks.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Configuration Management Analysis
2+
3+
This document synthesizes the analysis of the configuration management system.
4+
5+
## Strengths of the Current System
6+
7+
Configuration management is a standout feature of the Torrust Tracker Demo PoC,
8+
demonstrating a mature and secure approach.
9+
10+
- **Hybrid Approach (Files vs. Environment Variables - ADR-004)**: The project makes a
11+
pragmatic decision to use configuration files for stable, non-sensitive application
12+
behavior (e.g., timeouts, feature flags in `tracker.toml`) and environment variables
13+
for secrets and environment-specific values (e.g., database credentials, domain
14+
names). This aligns well with operational best practices and twelve-factor principles.
15+
- **Two-Level Environment Variable Structure (ADR-007)**: This is an excellent security
16+
practice. The system separates variables into two distinct levels:
17+
1. **Level 1 (Main Environment)**: Located in `infrastructure/config/environments/`,
18+
these files contain the complete set of variables for a deployment, including
19+
infrastructure secrets, API tokens, and application settings.
20+
2. **Level 2 (Docker Compose Environment)**: This is a filtered subset of the main
21+
environment, generated at deploy time into `application/.env`. It contains _only_ the
22+
variables required by the running containers. This practice adheres to the principle
23+
of least privilege and significantly reduces the attack surface of the application
24+
containers.
25+
- **Template-Based Configuration**: The use of `.tpl` files for all major configuration
26+
files (e.g., `cloud-init`, `tracker.toml`, `prometheus.yml`, `nginx.conf`) is a strong
27+
practice. It allows the application and infrastructure code to remain
28+
environment-agnostic, with environment-specific details injected during the
29+
deployment's release stage.
30+
- **Per-Environment Application Configuration Storage (ADR-008)**: This ADR specifies that
31+
final, generated application configuration files are stored in per-environment
32+
directories (`application/config/{environment}/`). This allows for version-controlled,
33+
auditable, and environment-specific application behavior.
34+
- **Centralized Configuration Script (`configure-app.sh`)**: This script acts as the
35+
engine for the configuration system. It sources the appropriate environment variables
36+
and uses `envsubst` to process all templates, generating the final configuration files
37+
that will be deployed to the server.
38+
39+
## Weaknesses and Areas for Improvement
40+
41+
- **Manual Secret Management**: The current system requires developers to manually copy
42+
template files (e.g., `local.env.tpl`) and populate the secret values. This is
43+
acceptable for a PoC but is not a secure or scalable practice for production
44+
environments where secrets should be managed by a dedicated system.
45+
- **Custom Scripting for Templating**: While `envsubst` is clever and effective, relying
46+
on custom shell scripting for configuration management can be less robust than using
47+
industry-standard tools.
48+
49+
## Recommendations for the Redesign
50+
51+
1. **Integrate a Secure Secrets Management System**: This is a non-negotiable requirement
52+
for the new production-grade installer. Secrets should never be stored in plaintext
53+
files, even if they are git-ignored. The new system must integrate with a solution
54+
like:
55+
56+
- HashiCorp Vault
57+
- AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault
58+
- Encrypted files using a tool like `sops`.
59+
Secrets should be fetched and injected into the environment at runtime.
60+
61+
2. **Implement Schema-Based Configuration Validation**: To prevent misconfigurations, the
62+
new system should implement schema-based validation for all configuration files. This
63+
could be done using JSON Schema, YAML schema validation libraries, or type-safe
64+
configuration objects in a high-level language like Python (with Pydantic) or Go.
65+
This catches errors early and ensures that all required configuration values are
66+
present and correctly formatted.
67+
68+
3. **Consider More Powerful Configuration Tooling**: While the current system works, the
69+
redesign could benefit from adopting more powerful, industry-standard tools for
70+
configuration management, which would reduce the amount of custom scripting required.
71+
This could include:
72+
- Using a dedicated configuration management tool like Ansible.
73+
- Leveraging the native templating engines of a higher-level language (e.g.,
74+
Jinja2 for Python).

0 commit comments

Comments
 (0)