Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .vault-pass.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
# This script retrieves the password for the current git repository from a Bitwarden Vault.
# If bw is not available, it should error out
set -e
bw get password "concat-datalab" || { read -sp "Password or Bitwarden CLI not found. Enter vault password: " password; echo "$password"; }
46 changes: 46 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Makefile that passes the CLI args to ansible for the given playbook tags
BW_AVAILABLE = $(shell command -v bw 2> /dev/null)
VAULT_PASS_ARGS = $(if $(BW_AVAILABLE),--vault-password-file ./.vault-pass.sh, --ask-vault-pass)
PLAYBOOK_CMD = uv run ansible-playbook $(VAULT_PASS_ARGS) -i ansible/inventory.yml ansible/playbook.yml
VAULT_FILES := $(shell find ansible/vaults/ -type f);

# Extract tags directly from YAML (faster but less accurate)
VALID_TAGS := $(shell grep -r "tags:" ansible/playbook.yml | sed 's/.*tags://g' | tr -d '[]"' | tr ',' '\n' | tr -d ' ' | sort -u | grep -v '^$$')

all:
$(PLAYBOOK_CMD)

list:
uv run ansible-playbook --list-tags ansible/playbook.yml

inventory:
uv run ansible-vault edit $(VAULT_PASS_ARGS) ansible/inventory.yml

install-ansible: requirements.txt
uv venv --python=3.13; uv pip install -r requirements.txt

encrypt-vaults:
@echo "Encrypting all vault files in ansible/vaults/ directory: $(VAULT_FILES)";
uv run ansible-vault encrypt $(VAULT_FILES)

vaults:
@echo "Select a vault file to edit:"; \
select file in $$(find ansible/inventory.yml ansible/vaults/ -type f); do \
if [ -n "$$file" ]; then \
uv run ansible-vault edit $(VAULT_PASS_ARGS) "$$file"; \
break; \
fi; \
done

%:
@if echo "$(VALID_TAGS)" | grep -wq "$@"; then \
echo "Running playbook with tag: $@"; \
$(PLAYBOOK_CMD) --tags=$@; \
else \
echo "Error: '$@' is not a valid tag."; \
echo "Valid tags are: $(VALID_TAGS)"; \
exit 1; \
fi


.PHONY: list
212 changes: 200 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
This repository contains tools and rules for automatically deploying datalab instances using Terraform/OpenTofu and Ansible.
It can be used as a template for deploying your own datalab instance, and optionally periodically resynced on new releases.

Use of Terraform for cloud provisioning is OPTIONAL; the Ansible playbooks are sufficient to set up a datalab instance on a existing hardware.
Use of Terraform/OpenTofu for cloud provisioning is OPTIONAL; the Ansible playbooks are sufficient to set up a datalab instance on a existing hardware.
The Ansible playbooks can even be used to deploy datalab on shared hardware; all mandatory services will be deployed within containers, so it is possible to set up the full NGINX + datalab stack alongside an existing reverse proxy managing other services, although this will need to be configured by the user.

Ideally, various aspects of the configuration will be transferrable, and thus only instance-specific configuration will need to be provided (e.g., usernames, domain names), in which case your instance version of this repository can be kept fairly in-sync with the main branch (which will continue to be updated as datalab's requirements grow and change).
Expand All @@ -21,12 +21,22 @@ Attempts will be made to tabulate the supported versions of datalab with each re

## Supported versions

<div align="center">

| This repository version | *datalab* version |
|---|---|
| v0.1.x | v0.4.x |
| v0.2.x | v0.4.x |
| v0.3.x | v0.5.x+ |
| v0.3.x | v0.5.0-rc.x |
| v0.4.x | v0.5.x |
| v0.5.x | v0.6.x |
| v0.6.x | v0.6.x |

</div>

## Changelog

The changelog for this repository can be found in the [release notes](https://github.com/datalab-industries/datalab-ansible-terraform/releases) on GitHub.

## Overview

Expand All @@ -53,7 +63,7 @@ Stack:
- Docker Compose for API, app and database containers
- Simple filesystem for filestore

### Datalab versioning
### *datalab* versioning

The Ansible playbooks will deploy the datalab version that is included as a `git
submodule` to this repository.
Expand All @@ -64,19 +74,36 @@ repository state (i.e., no uncommited changes) to ensure reproducibility.

### Ansible automation

#### First-time setup

These instructions assume you have prepared the server on which you would like
to deploy *datalab*, and that it is:

- accessible via SSH (using your local SSH config),
- running Ubuntu 22.04 or a similar distro with `apt` available.

It also assumes you have set up Ansible on your local machine.
You can find instructions for this in the [Ansible documentation](https://docs.ansible.com/ansible/latest/getting_started/get_started_ansible.html).
It also assumes that your local machine is running a Unix-like OS (Linux, WSL, macOS) with `git` and `bash`,
`make` and `sed` available.

You can find more information on the requirements of the server and control node in the [Ansible documentation](https://docs.ansible.com/ansible/latest/getting_started/get_started_ansible.html).

The first step is to clone this repository (or your fork) with submodules:
The first step is to clone this repository (or your fork/templated version) with submodules and then install Ansible and its dependencies.
We recommend using [uv](https://astral.sh/uv) for this, as the included [Makefile] will use it to run the playbooks in a virtual
environment.

```shell
git clone --recurse-submodules [email protected]:datalab-org/datalab-ansible-terraform
cd datalab-ansible-terraform
uv venv --python 3.12
uv pip install -r requirements.txt
```

or alternatively,

```shell
git clone --recurse-submodules [email protected]:datalab-org/datalab-ansible-terraform
cd datalab-ansible-terraform
make install-ansible
```

You can then navigate to the to the `./ansible` directory to begin configuring
Expand Down Expand Up @@ -105,6 +132,9 @@ ungrouped:
borg_encryption_passphrase: <the passphrase for the borg encryption>
borg_remote_path: <the command to run borg on the repository (e.g., borg1 vs borg2)>
borg_repository: <the path to the borg repository, either local or remote>
prometheus_remote_write_url: <your_prometheus_instance_url, e.g., https://grafana.datalab.industries/prometheus/api/v1/write>
prometheus_user: <your_prometheus_username>
prometheus_password: <your_prometheus_password>
```

where `<hostname>` and the various setting should be configured with your chosen
Expand All @@ -130,10 +160,20 @@ To encrypt them, you can run
ansible-vault encrypt inventory.yml vaults/datalab/prod_config.json vaults/datalab/.env vaults/datalab/.env_server vaults/datalab/.ssh/* vaults/borg/.ssh/*
```

or simply

```shell
make encrypt-vaults
```

and provide a password when prompted (which will then need to be kept safe and
used every time the Ansible playbook is run). Omit the optional SSH wildcards if no
SSH keys are required.
You should never commit these files directly without encryption.
used every time the Ansible playbook is run).
You should never commit these files directly without encryption, even to a
private git repository.

If you are using your own domain (configured via `app_url` and `api_url` in the inventory),
then you will need to update your domain's DNS settings so that your subdomains point to the IP
of the server as given in your inventory file.

Once all these configuration steps have been performed, we can try to execute
the Ansible "playbook" that will install all the pre-requisite services, build
Expand All @@ -146,10 +186,158 @@ This is achieved by running:
ansible-playbook --ask-vault-pass -i inventory.yml playbook.yml
```

If completed successfully, the server should now be running a *datalab*
instance at your configured URL!
or simply

```
make
```

If completed successfully, the server should now be running a *datalab* instance at your configured URLs!

#### Keeping things up to date

To update the *datalab* version, you simply update the git submodule in
`src/datalab` and rerun the playbook. This can be pinned to your fork and accomodate any custom changes
you desire (though you may also need to test and maintain your own set of
ansible rules and configuration for this).

Once you have chosen the *datalab* version, it can be redployed with `make
deploy` or

```shell
ansible-playbook --ask-vault-pass -i inventory.yml playbook.yml --tags deploy
```

To update the ansible playbooks themselves with any changes from the upstream
repository, you can similarly maintain the submodule in
`src/datalab-ansible-terraform` and either manually sync changes across, or use
the helper script:

```shell
chmod u+x sync-ansible-upstream.sh && ./sync-ansible-upstream.sh
```

which will copy just the changed playbooks across, and commit them. You should
be careful to review these changes before committing them to your fork,
especially if you have made any custom changes to the playbooks.
Be sure to also commit the changes to your submodule so you know precisely which versions
of the playbooks are running.

The [`Makefile`] also contains a number of other useful commands, such as:

```shell
make vaults
```

to edit the encrypted vault files, and

```shell
make list
```

to list all of the available Ansible tags that can be run individually.

#### Bitwarden integration

Constantly entering the vault password for every attempted deployment can be a
bit tedious, so by default, the `Makefile` will attempt to retrieve the vault
password from a local [Bitwarden CLI](https://bitwarden.com/help/cli/) installation,
which either only requires you to enter your Bitwarden password once per
session, or can be configured to remain logged in.

To use this feature, you will need to store your vault password in Bitwarden
using the same name as the cloned repository as reported locally by

```shell
$ git remote get-url origin
[email protected]:datalab-org/datalab-demo-deployment
{^^^^^^^^^^^^^^^^^^^^^}
repository name
```

If you intially cloned this repo and then renamed it, you can update your local
version to use the same name using:

```shell
git remote set-url origin <my-git-repo-url>
```

You can also simply edit the `.vault-pass.sh` script to return your vault
password in another way if you prefer.

You may also need to set the script to be executable with

```shell
chmod u+x .vault-pass.sh
```

#### Backups

If you are using your own domain, you will need to update your DNS settings so that your domain name points to the IP of the server as given in your inventory file.
##### Native backups

By default, *datalab* will take native snapshot backups at a certain frequency
and save them into `/data/backups` on the server with the configured retention
rules (typically keeping 7 daily copies, 6 monthly and 4 yearly).
These backups can be synced with a remote system to avoid data loss in the event
of a hardware failure on your *datalab* server.

More information on backups can be found in the
[datalab documentation](https://docs.datalab-org.io/en/stable/deployment/#backups).

##### Borg backups (recommended)

This repository contains playbooks for automating much more robust backups
using [Borg](https://www.borgbackup.org/en/stable/) and
[borgmatic](https://torsion.org/borgmatic/).

These backups are encrypted, de-duplicated and compressed, requiring significantly
less space than the native backup option, and can be synced easily with remote
Borg instances over SSH.
You will need to have an appropriate remote server (ideally separate from the *datalab* server itself)
running Borg (we recommend [rsync.net](https://www.rsync.net/products/borg.html) which has excellent borg support).

To activate the Borg playbooks, you must first provide an encrypted SSH key pair for accessing
your remote Borg server in the `./vaults/borg/.ssh` directory.
It should be named `./vaults/borg/.ssh/id_ed25519` and contain the private key for
passwordless SSH connection to the remote Borg server configured in the
inventory.
Any other encrypted files in this `.ssh` vault will be mounted in the Borg
container during the backup procedure. You may wish to include an `.ssh/config`
to set up any extra settings (e.g., proxies, host checking), or an `.ssh/known_hosts` to enable strict host checking.

> If you are running a *datalab* instance yourself and are looking for a place
> for your encrypted Borg backups, feel free to reach out to us as we may have
> enough of an overhead to be a secondary backup host for you.

#### Server monitoring

One basic option for uptime monitoring is to use a free GitHub Actions based
service like [Upptime](https://github.com/upptime/upptime).
For example, this is used for simple services in the central *datalab* organisation at [datalab-org/datalab-org-status](https://github.com/datalab-org/datalab-org-status).

For more advanced monitoring, the Ansible playbooks contain a role tagged as
`monitoring`, which will install and configure metrics harvesters using
[Prometheus](https://prometheus.io/) (with [Node Exporter](https://github.com/prometheus/node_exporter) and
[cAdvisor](https://github.com/google/cadvisor)) to monitor the host system and
containers.

To make use of this monitoring, you will need your own [Grafana instance](https://grafana.com/oss/grafana) (also running Prometheus as a harvester of the remote metrics) to visualise the metrics.

Alternatively, you can use a hosted Grafana service such as [Grafana Cloud](https://grafana.com/products/cloud/), or request to use our central
*datalab* Grafana instance by reaching out to us on Slack or over email.

This integration can be enabled by adding the following variables to your inventory:

```yaml
prometheus_remote_write_url: <your_prometheus_instance_url, e.g., https://grafana.datalab.industries/prometheus/api/v1/write>
prometheus_user: <your_prometheus_username>
prometheus_password: <your_prometheus_password>
```
and then running the playbook with the `monitoring` tag:

```shell
make monitoring
```

### Cloud provisioning

Expand Down
17 changes: 0 additions & 17 deletions ansible/Makefile

This file was deleted.

Loading
Loading