torrust
diff --git a/‎docs/redesign/phase0-goals/project-goals-and-scope.md‎
Lines changed: 45 additions & 8 deletions b/‎docs/redesign/phase0-goals/project-goals-and-scope.md‎
Lines changed: 45 additions & 8 deletions
diff --git a/‎docs/redesign/phase2-analysis/01-high-level-architecture.md‎
Lines changed: 62 additions & 0 deletions b/‎docs/redesign/phase2-analysis/01-high-level-architecture.md‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎docs/redesign/phase2-analysis/02-automation-and-tooling.md‎
Lines changed: 60 additions & 0 deletions b/‎docs/redesign/phase2-analysis/02-automation-and-tooling.md‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎docs/redesign/phase2-analysis/03-configuration-management.md‎
Lines changed: 74 additions & 0 deletions b/‎docs/redesign/phase2-analysis/03-configuration-management.md‎
Lines changed: 74 additions & 0 deletions
@@ -66,14 +66,25 @@ the barrier to tracker adoption.**
 - **Not included**: Ongoing maintenance automation
 - **Alternative**: Users handle maintenance through standard system administration practices
 
-### Dynamic Scaling
-
-**Rationale**: Torrust tracker does not support horizontal scaling architecturally.
-
-- **Not included**: Auto-scaling based on load
-- **Not included**: Multi-instance load balancing
-- **Not included**: Automatic migration to larger servers
-- **Alternative**: Manual migration by deploying to new infrastructure and migrating data
+### Dynamic Scaling and High Availability
+
+**Rationale**: The installer is intentionally focused on a single-node deployment
+for two primary reasons:
+
+1. **Application Architecture**: The Torrust tracker application itself does not
+   natively support horizontal scaling. Peer data is managed in memory on a
+   single instance, meaning that true high availability or load balancing would
+   require significant changes to the core tracker application, which is beyond
+   the scope of this installer project.
+2. **Target Audience**: The primary users are often hobbyists or small groups
+   who require a simple, cost-effective, single-server deployment. The current
+   architecture meets this need directly.
+
+- **Not included**: Auto-scaling based on load.
+- **Not included**: Multi-instance load balancing or high-availability clusters.
+- **Not included**: Automatic migration to larger servers.
+- **Alternative**: Users can manually migrate to a more powerful server by
+  provisioning new infrastructure and transferring their data.
 
 ### Migration Between Providers
 
@@ -98,6 +109,32 @@ the barrier to tracker adoption.**
 **Rationale**: Provider-level resource isolation requires complex provider-specific
 implementation that varies significantly across cloud providers.
 
+### Multi-User Deployment Management
+
+**Rationale**: The project is designed for a single system administrator to perform a one-time
+deployment. It is not intended to be a multi-user platform for managing different
+environments.
+
+- **Not included**: Remote state management for team collaboration (e.g., Terraform Cloud, S3 backend)
+- **Not included**: Role-based access control for infrastructure changes
+- **Not included**: Environment management for multiple users
+- **Alternative**: The system uses local state files, which is sufficient for the
+  single-administrator use case. Disaster recovery relies on data and configuration backups,
+  not on collaborative state management.
+
+### Generic Infrastructure Abstraction Layer
+
+**Rationale**: Building a custom abstraction layer to normalize infrastructure resources across
+different cloud providers (e.g., creating a generic "server" or "network" concept) is a
+significant engineering effort that replicates the core functionality of tools like OpenTofu
+and Terraform. The project's goal is to leverage these existing IaC tools, not to reinvent
+them.
+
+- **Not included**: A custom, intermediate API or schema for defining infrastructure.
+- **Alternative**: Directly use provider-specific configurations within OpenTofu, mapping
+  project needs to the native capabilities of each provider. This approach is more maintainable
+  and aligns with industry best practices.
+
 - **Not included**: Resource name prefixes for environment isolation
 - **Not included**: Private network creation for environment separation
 - **Not included**: Provider-specific isolation mechanisms (VPCs, resource groups, etc.)
 
@@ -0,0 +1,62 @@
+# High-Level Architecture Analysis
+
+This document synthesizes the architectural analysis.
+
+## Core Architectural Principles
+
+The Torrust Tracker Demo project is a Proof of Concept (PoC) that successfully
+demonstrates a production-ready deployment of the Torrust Tracker. Its
+architecture is built on several strong, modern principles:
+
+- **Twelve-Factor App Methodology**: The project adheres to the twelve-factor app principles,
+  promoting portability, scalability, and clean deployment practices. There is a clear and
+  well-executed distinction between the build, release, and run stages.
+- **Separation of Concerns**: There is an excellent separation between the `infrastructure` and
+  `application` layers. This is a solid foundation that makes it easier to manage different
+  parts of the system independently. The two-stage deployment process (`make infra-apply`
+  followed by `make app-deploy`) is a direct and beneficial result of this separation.
+- **Infrastructure as Code (IaC)**: The use of OpenTofu/Terraform for infrastructure
+  management is a modern and robust approach. It ensures that infrastructure is reproducible,
+  version-controlled, and documented.
+- **Immutable Infrastructure Philosophy**: The design encourages treating infrastructure as
+  immutable. VMs can be destroyed and recreated easily without manual intervention, which is a
+  core tenet of modern cloud-native development.
+
+## Key Architectural Layers
+
+- **Infrastructure Layer (`/infrastructure`)**: Manages the provisioning of virtual
+  machines (VMs) and underlying network resources using **OpenTofu/Terraform** and
+  **cloud-init**. It is designed to be modular, with support for different providers
+  (e.g., libvirt for local, Hetzner for cloud).
+- **Application Layer (`/application`)**: Contains the application services, which are
+  orchestrated using **Docker Compose**. This includes the Torrust Tracker itself, a MySQL
+  database, an Nginx reverse proxy, and monitoring tools like Prometheus and Grafana.
+- **Automation Layer (`Makefile`)**: A root `Makefile` serves as the primary, user-friendly
+  entry point for all development and deployment tasks, orchestrating the complex scripts
+  required for provisioning and deployment.
+
+## Areas for Improvement
+
+While the foundation is strong, several areas have been identified for improvement in the
+greenfield redesign:
+
+- **Monolithic Repository**: The current repository contains the PoC code, extensive
+  documentation, and the new redesign plans. This can be confusing for newcomers. The plan to
+  split the new implementation into a separate, clean repository is a step in the right
+  direction.
+- **Over-reliance on Shell Scripts**: The automation is heavily dependent on a large
+  collection of bash scripts. While effective for a PoC, this approach can be brittle and
+  hard to maintain for a production-grade system.
+- **Provider Configuration Strategy**: The system supports multiple providers, such as Libvirt
+  for local development and Hetzner for cloud deployments, which can be used concurrently. The
+  design avoids creating a custom, generic abstraction layer for infrastructure providers, as
+  this would replicate the functionality already present in OpenTofu. Instead, the project's
+  strategy is to directly map provider-specific characteristics (e.g., instance sizes,
+  regions) to concrete OpenTofu configuration values. This approach leverages the power of the
+  underlying IaC tool without adding unnecessary complexity.
+- **State Management**: The PoC uses local OpenTofu/Terraform state files. While this model
+  does not support team collaboration, it aligns with the project's intended use case: a
+  single system administrator performing an initial one-time deployment. For disaster
+  recovery, the emphasis is on backing up application data and configurations, allowing for
+  manual restoration, rather than on collaborative infrastructure management through remote
+  state.
@@ -0,0 +1,60 @@
+# Automation and Tooling Analysis
+
+This document synthesizes the analysis of the automation and tooling.
+
+## Strengths of the Current Automation
+
+The project is heavily and effectively automated, which is a major strength for
+ensuring consistency and reproducibility.
+
+- **Centralized Entry Point (`Makefile`)**: The root `Makefile` is an excellent feature,
+  providing a simple and user-friendly interface for the entire project. Complex,
+  multi-step workflows are simplified into single, memorable commands like `make dev-deploy`,
+  `make test-e2e`, and `make lint`.
+- **Comprehensive Automation**: The PoC automates nearly the entire project lifecycle, from
+  initial dependency installation (`make install-deps`) to infrastructure provisioning,
+  application deployment, health checks, and resource cleanup.
+- **Well-Organized Shell Scripts**: The project uses a collection of well-organized,
+  POSIX-compliant shell scripts located in `/scripts`, `/infrastructure/scripts`, and
+  `/application/scripts`. These scripts handle the core logic for:
+  - **Configuration Generation**: `configure-env.sh` and `configure-app.sh` process
+    templates to create environment-specific configuration files.
+  - **Deployment**: `provision-infrastructure.sh` and `deploy-app.sh` orchestrate the
+    twelve-factor build, release, and run stages.
+  - **Utilities**: `shell-utils.sh` provides a library of common functions for logging, error
+    handling, and user-friendly sudo password management.
+- **Integrated Linting**: The project enforces strict code quality standards through a
+  comprehensive linting script (`/scripts/lint.sh`). This script integrates multiple
+  linters, providing a single command to validate the entire codebase:
+  - `shellcheck` for shell scripts.
+  - `yamllint` for YAML files.
+  - `markdownlint` for documentation.
+  - `tflint` for Terraform code.
+
+## Weaknesses and Areas for Improvement
+
+- **Over-reliance on Bash for Complex Logic**: The heavy use of bash for complex
+  automation logic is a significant drawback. Bash scripts can be brittle, difficult to
+  test, and hard to maintain as complexity grows. They lack the robust error handling,
+  data structures, and testing frameworks available in higher-level languages.
+- **Lack of Idempotency in Some Scripts**: While the goal is idempotency, some scripts may
+  not be fully idempotent. For example, running `app-deploy` multiple times could have
+  unintended side effects if not carefully managed. A production-grade tool should
+  guarantee the same result no matter how many times it is run.
+
+## Recommendations for the Redesign
+
+1. **Adopt a Higher-Level Language for Automation**: This is the most critical
+   recommendation. The new installer should be written in a language like **Python**, **Go**,
+   or **Rust**.
+   - **Benefits**: This would provide superior error handling, mature testing frameworks,
+     better dependency management, and access to official cloud provider SDKs. It would
+     make the entire system more robust, maintainable, and easier to extend.
+   - **Trade-offs**: While it might introduce a new language dependency for contributors, the
+     long-term benefits for a project of this scale far outweigh this initial cost.
+2. **Use a Dedicated Configuration Tooling**: Instead of relying on `envsubst` and custom
+   shell scripts for templating, the new system should adopt a more powerful and standard
+   configuration management tool or a language-native templating engine, such as:
+   - Jinja2 (if using Python).
+   - Go's `text/template` package (if using Go).
+   - Tools like Ansible for more complex configuration and orchestration tasks.
@@ -0,0 +1,74 @@
+# Configuration Management Analysis
+
+This document synthesizes the analysis of the configuration management system.
+
+## Strengths of the Current System
+
+Configuration management is a standout feature of the Torrust Tracker Demo PoC,
+demonstrating a mature and secure approach.
+
+- **Hybrid Approach (Files vs. Environment Variables - ADR-004)**: The project makes a
+  pragmatic decision to use configuration files for stable, non-sensitive application
+  behavior (e.g., timeouts, feature flags in `tracker.toml`) and environment variables
+  for secrets and environment-specific values (e.g., database credentials, domain
+  names). This aligns well with operational best practices and twelve-factor principles.
+- **Two-Level Environment Variable Structure (ADR-007)**: This is an excellent security
+  practice. The system separates variables into two distinct levels:
+  1. **Level 1 (Main Environment)**: Located in `infrastructure/config/environments/`,
+     these files contain the complete set of variables for a deployment, including
+     infrastructure secrets, API tokens, and application settings.
+  2. **Level 2 (Docker Compose Environment)**: This is a filtered subset of the main
+     environment, generated at deploy time into `application/.env`. It contains _only_ the
+     variables required by the running containers. This practice adheres to the principle
+     of least privilege and significantly reduces the attack surface of the application
+     containers.
+- **Template-Based Configuration**: The use of `.tpl` files for all major configuration
+  files (e.g., `cloud-init`, `tracker.toml`, `prometheus.yml`, `nginx.conf`) is a strong
+  practice. It allows the application and infrastructure code to remain
+  environment-agnostic, with environment-specific details injected during the
+  deployment's release stage.
+- **Per-Environment Application Configuration Storage (ADR-008)**: This ADR specifies that
+  final, generated application configuration files are stored in per-environment
+  directories (`application/config/{environment}/`). This allows for version-controlled,
+  auditable, and environment-specific application behavior.
+- **Centralized Configuration Script (`configure-app.sh`)**: This script acts as the
+  engine for the configuration system. It sources the appropriate environment variables
+  and uses `envsubst` to process all templates, generating the final configuration files
+  that will be deployed to the server.
+
+## Weaknesses and Areas for Improvement
+
+- **Manual Secret Management**: The current system requires developers to manually copy
+  template files (e.g., `local.env.tpl`) and populate the secret values. This is
+  acceptable for a PoC but is not a secure or scalable practice for production
+  environments where secrets should be managed by a dedicated system.
+- **Custom Scripting for Templating**: While `envsubst` is clever and effective, relying
+  on custom shell scripting for configuration management can be less robust than using
+  industry-standard tools.
+
+## Recommendations for the Redesign
+
+1. **Integrate a Secure Secrets Management System**: This is a non-negotiable requirement
+   for the new production-grade installer. Secrets should never be stored in plaintext
+   files, even if they are git-ignored. The new system must integrate with a solution
+   like:
+
+   - HashiCorp Vault
+   - AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault
+   - Encrypted files using a tool like `sops`.
+     Secrets should be fetched and injected into the environment at runtime.
+
+2. **Implement Schema-Based Configuration Validation**: To prevent misconfigurations, the
+   new system should implement schema-based validation for all configuration files. This
+   could be done using JSON Schema, YAML schema validation libraries, or type-safe
+   configuration objects in a high-level language like Python (with Pydantic) or Go.
+   This catches errors early and ensures that all required configuration values are
+   present and correctly formatted.
+
+3. **Consider More Powerful Configuration Tooling**: While the current system works, the
+   redesign could benefit from adopting more powerful, industry-standard tools for
+   configuration management, which would reduce the amount of custom scripting required.
+   This could include:
+   - Using a dedicated configuration management tool like Ansible.
+   - Leveraging the native templating engines of a higher-level language (e.g.,
+     Jinja2 for Python).