Andreas' Blog

Adventures of a software engineer/architect

Hosting Gitea and Drone with Docker

2019-03-24 8 min read anoff

This post will walk you through setting up a self hosted git based continuous integration environment on a two machine setup - assuming you already have two virtual machines at your disposal. Using Gitea for git hosting and contribution management and Drone for docker-based build jobs, this will guide you through creating docker-compose files as well as configuring the individual services and getting SSL certificates via traefik. Docker and docker-compose knowledge is required for this tutorial. It mostly focuses on the correct configuration of all the services at play here and not explaining their basic functionality.

This tutorial uses Azure resources so some of the aspects might not be 100% applicable if you chose another infrastructure provider.

Infrastructure setup
Figure 1. Required infrastructure setup

The post will keep referring to the gitea vm or the drone machine, it might be useful to add DNS records to easily identify which machine you are looking at.

Prepare PostgreSQL

This is suggested if you want to have a shared PostgreSQL server with other applications accessing it too. Administration is easiest via Azure Cloud shell after activating Allow access to Azure services in the Connection Security pane of the Postgres server. The Azure Shell already has the psql and other CLIs installed by default.

psql --host=mydb.postgres.database.azure.com --port=5432 [email protected] --dbname=postgres
# interactive password login
CREATE EXTENSION pg_trgm;
CREATE ROLE gitea with LOGIN CREATEDB PASSWORD 'supersexypassword';
CREATE DATABASE gitea;
GRANT ALL PRIVILEGES ON DATABASE gitea to gitea;

Configure the Virtual Machines

For each virtual machine do the following steps

  1. Install Docker & docker-compose
  2. Format & auto mount data disk
  3. Create a Docker user
  4. Change SSH port to 2022

Install Docker & docker-compose

# Docker
#   https://docs.docker.com/install/linux/docker-ce/ubuntu/
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get -y install \
  apt-transport-https \
  ca-certificates \
  curl \
  gnupg-agent \
  software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io

# Compose
#   https://docs.docker.com/compose/install/
sudo curl -L "https://github.com/docker/compose/releases/download/1.23.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

Mount data disk

lsblk: Find available disks and check for the device mountpoint of your data disk

The following script will format the device as ext4 and mount it under /data. If you prefer your data to reside anywhere else keep in mind that this tutorial uses /data later on again.

sudo -s # enter root mode
DATA_DISK=/dev/sdc # CHANGE TO YOUR <MOUNTPOINT>
# Mount data disk, xfs will not encrypt on Azure
mkfs.ext4 ${DATA_DISK}

# identify UUID: blkid -s UUID
DATA_DISK_UUID=$(blkid -s UUID -o value ${DATA_DISK})
mkdir /data
# Add line to /etc/fstab
echo "UUID=${DATA_DISK_UUID} /data auto defaults 0 0" >> /etc/fstab
mount -a
exit

For more information refer to the [Azure help](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/attach-disk-portal).

Create a Docker user

To isolate the docker process create a separate unix user that belongs to the docker group and owns the data directory. This user will later be used to run the docker containers.

sudo -s
DOCKER_USER=minion
mkdir /data/home
mkdir /data/volumes
useradd -d /data/home ${DOCKER_USER}
chown ${DOCKER_USER}: /data/home -R
chown ${DOCKER_USER}: /data/volumes/ -R
# give docker access by adding it to the group
gpasswd -a ${DOCKER_USER} docker
exit
# find user ID
id ${DOCKER_USER}

Change SSH port

Changing the SSH port only reduces the attack vector but does not make your system any more secure. It is still a practice I follow for all my systems because it is minimal effort and makes it harder for port scanners to find the machine. In the case of the Gitea server this is also recommended as it easily allows you to bind the SSH port for git access.

sudo nano /etc/ssh/sshd_config
# uncomment and change L3: Port 2222

Reverse Proxy setup

Even though Gitea and Drone come with integrated Let’s Encrypt support to generate SSL certificates I chose another path. For one reason I could not get them working properly and adding a dedicated reverse proxy to handle SSL and routing makes any migrations or changes to the setup easier.

In this setup traefik is used as a low profile router. The picture below shows an example setup how traefik can be used within docker to make two different services A and service B accessible from the outside, both via HTTP on port 80 as well as auto generated SSL certificates on HTTPS 443. Traefik by default forwards HTTP requests to HTTPS which should be what you want. By configuring traefik within the docker-compose you can expose both services under different DNS names with a single public IP address.

Proxy setup
Figure 2. Reverse proxy in docker using traefik

Setting up Gitea

Connect to the gitea VM, switch to the user that was previously created using sudo su - minion and create the docker-compose.yml. Even though it is possible to configure Gitea to some extend using docker-compose I suggest only using minimal definitions in the compose file itself and then modify the app.ini after an initial start. There are certain settings that are only possible in the ini file itself.

Compose file

docker-compose.yml
version: "2"

networks:
  gitea:
    external: false

services:
  server:
    image: gitea/gitea:1 # :latest runs dev builds https://hub.docker.com/r/gitea/gitea/tags
    environment:
      - USER_UID=1001 1
      - USER_GID=1001
      - DB_TYPE=postgres
      - DB_HOST=mydb.postgres.database.azure.com:5432 2
      - DB_NAME=gitea
      - DB_USER=gitea
      - DB_PASSWD=supersexypassword
      - SSH_DOMAIN=gitea.mydomain.com 3
      - HTTP_PORT=80
      - ROOT_URL=""
    networks:
      - gitea
    volumes:
      - /data/volumes/gitea:/data 4
    ports:
      - "3000:3000" 5
      - "22:22"
    labels:
      - "traefik.enabled=true"
      - "traefik.backend=gitea"
      - "traefik.frontend.rule=Host:gitea.mydomain.com" 3
      - "traefik.docker.network=gitea"
      - "traefik.port=3000" 5
    container_name: gitea
    restart: always

  traefik:
    image: traefik:latest
    command: --docker
    ports:
      - "80:80" 6
      - "443:443"
    labels:
      - "traefik.enable=true"
      - "traefik.backend=dashboard"
      - "traefik.frontend.rule=Host:traefik-gitea.mydomain.com"
      - "traefik.port=8080"
    networks:
      - gitea
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /data/volumes/traefik/traefik.toml:/traefik.toml 7
      - /data/volumes/traefik/acme.json:/acme.json 7
    container_name: traefik
    restart: always
  1. add the ID of the user that was created previously for running the docker commands
  2. replace with your own PostgreSQL credentials
  3. Gitea needs to know its domain to generate correct links, Traefik needs to know it to generate correct SSL certs and route correctly
  4. stores any data written by Gitea onto /data (data disk)
  5. tell Gitea which custom port to run on and traefik where to route the requests to
  6. Traefik needs to be reachable via HTTP to run the Let’s encrypt challenge for domain verification
  7. Two custom files need to be passed to the traefik container

Traefik configuration

Before starting the containers add the traefik configuration files under /data/volumes/traefik/

traefik.toml
#Traefik Global Configuration
debug = false
checkNewVersion = true
logLevel = "ERROR"

#Define the EntryPoint for HTTP and HTTPS
defaultEntryPoints = ["https","http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.https]
address = ":443"
#Enable automatically redirect HTTP to HTTPS
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https.tls]

#Enable Traefik Dashboard on port 8080
#with basic authentication method
[entryPoints.dash]
address=":8080"
[entryPoints.dash.auth]
[entryPoints.dash.auth.basic]
    users = [
        "minion:<base64encodedpassword>",
    ]

[api]
entrypoint="dash"
dashboard = true

#Enable retry sending a request if the network error
[retry]

#Define Docker Backend Configuration
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "mydomain.com"
watch = true
exposedbydefault = false

#Define the Letsencrypt ACME HTTP challenge
[acme]
email = "[email protected]"
storage = "acme.json"
entryPoint = "https"
OnHostRule = true
  [acme.httpChallenge]
  entryPoint = "http"

See the howtoforge.com/ubuntu-docker-traefik-proxy blog post for more details.

The acme.json file just needs before starting docker.

touch /data/volumes/traefik/acme.json
chmod 600 touch /data/volumes/traefik/acme.json

Starting the Gitea container

Finally start the services on the gitea VM from the home directory

cd $HOME
docker-compose up -d

This is the setup we just rolled out on the Gitea VM.

Docker images for Gitea VM
Figure 3. Gitea docker setup

Customize Gitea

After the initial start of the Gitea container, stop it again docker-compose stop and modify the configuration as needed. Refer to the Gitea Config Cheatsheet for a full list of available settings.

nano /data/volumes/gitea/gitea/config/app.ini

Setting up Drone

The drone VM will also use traefik as a reverse proxy and SSL provider, refer to the Gitea setup for the generic traefik steps. SSH into the Drone VM and change to the minion user.

Drone compose file

Place the following file into the home directory:

docker-compose.yml
version: "2"

networks:
  internal:
    external: false

services:
  drone-server:
    # https://hub.docker.com/r/drone/drone/tags
    image: drone/drone:1.0.0
    ports:
      - "3000"
    networks:
      - internal
    volumes:
      - /data/volumes/drone-server:/var/lib/drone/
      - /var/run/docker.sock:/var/run/docker.sock
    restart: always
    environment:
      DRONE_SERVER_PORT: "3000" 3
      DRONE_SERVER_HOST: "drone.mydomain.com"
      DRONE_SERVER_PROTO: "https"
      # DRONE_DEBUG: "true"
      DRONE_SECRET: "somethingverysecret" 1
      DRONE_DATABASE_DRIVER: sqlite3
      DRONE_DATABASE_DATASOURCE: /var/lib/drone/drone.sqlite
      DRONE_RUNNER_CAPACITY: 2
      DRONE_TLS_AUTOCERT: "false"
      # DRONE_ORGS: ""
      DRONE_ADMIN: myusername
      DRONE_ADMIN_ALL: "false"
      # GITEA params
      DRONE_GITEA_SERVER: "https://gitea.mydomain.com" 2
      DRONE_GITEA_SKIP_VERIFY: "false"
      DRONE_GIT_USERNAME: "drone-runner" 4
      DRONE_GIT_PASSWORD: "anothersecretpassword"
      DRONE_GIT_ALWAYS_AUTH : "true" 5
    labels:
      - "traefik.enabled=true"
      - "traefik.backend=drone"
      - "traefik.frontend.rule=Host:traefik-drone.mydomain.com"
      - "traefik.docker.network=internal"
      - "traefik.port=3000" 3

  drone-agent:
    image: drone/agent:1.0.0
    command: agent
    depends_on:
      - drone-server
    networks:
      - internal
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: always
    environment:
      DRONE_SERVER: ws://drone-server/ws/broker
      DRONE_DEBUG: "true"
      DRONE_SECRET: "somethingverysecret" 1

  traefik:
    image: traefik:latest
    command: --docker
    ports:
      - "80:80"
      - "443:443"
    labels:
      - "traefik.enable=true"
      - "traefik.backend=dashboard"
      - "traefik.frontend.rule=Host:drone.mydomain.com"
      - "traefik.port=8080"
    networks:
      - internal
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /data/volumes/traefik/traefik.toml:/traefik.toml
      - /data/volumes/traefik/acme.json:/acme.json
    container_name: traefik
    restart: always
  1. the drone server and agents need to use the same secret to communicate
  2. Drone needs to know its domain to generate correct links, Traefik needs to know it to generate correct SSL certs and route correctly
  3. tell Drone which custom port to run on and traefik where to route the requests to
  4. valid Gitea user credentials for Drone to use when fetching repositories
  5. Always authenticate when running git commands against Gitea - required if you use private repositories

Starting the Gitea container

Caution:

Make sure traefik is correctly configured on this VM as well

Start the docker containers from the home directory

cd $HOME
docker-compose up -d

The final setup for Drone looks like this

Docker images for Drone VM
Figure 4. Drone docker setup

Summary

Gitea and Drone are running on individual machines but are fully configured to talk to each other. This separation beyond the container level guarantees that the performance of the git server will not be influenced by any builds running in the CI/CD pipeline. If that is not a concern the both services could also be set up on a single virtual machine with either one or two docker-compose files.

I hope this tutorial helped you and if you stumble upon any problems or errors let me know in the comments or via Twitter 👋

Automated dev workflow for using Data Science VM on Azure

2018-03-22 11 min read anoff
tl;dr; I put together a bunch of scripts on Github that let you deploy a VM from your command line as well as sync code from your local directory to the VM easily to be able to use local IDE and git but execute on the powerful remote machine. Perfect for Data Science applications based around jupyter notebook. In my previous blog post I explained how to do Terraform deployment of an Azure Data Science Virtual Machine. Continue reading

Deploy Datascience infrastructure on Azure using Terraform

2018-01-23 6 min read anoff
In this article I will talk about my experience building my first infrastructure deployment using Terraform that does (a little) more than combining off-the-shelf resources. The stack we will deploy 📦 Lately I’ve been looking at a lot of Microsoft Azure services in the big data area. I am looking for something to replace a Hadoop based 🐘 data analytics environment consisting mainly of HDFS, Spark & Jupyter. The most obvious solution is to use a HDInsight cluster which is basically a managed Hadoop that you can pick in different flavours. Continue reading