Skip to main content

Traefik — Reverse Proxy & TLS Terminator

Project: Traefik Proxy Host: traefik / traefik.home.lab IP: 10.69.20.40 VLAN: 20 (SERVERS) Web UI: https://traefik.home.helix9.org (LAN-only or Authentik) Role path: roles/traefik/ Playbook: playbooks/traefik.yml


Overview

Traefik is the single ingress point for every HTTP(S) service in the home network. It terminates TLS, fetches and renews ACME certificates via Cloudflare DNS-01, and routes requests to backends across the SERVERS, MGMT, and DMZ zones. It also fronts the Authentik forward-auth flow for services that need SSO.

Request flow:

client backend
│ ▲
│ *.home.helix9.org → VyOS DNS forwarder → 10.69.20.40 │
│ │
▼ │
┌──────────────────┐ middleware ┌─────────────────┐ │
│ Traefik :443 │ ─ local-only ─►│ Authentik fwd │ │
│ TLS terminator │ ─ authentik ─►│ auth (9000) │ │
└────────┬─────────┘ └─────────────────┘ │
│ HTTP / HTTPS (proxmox-transport) │
└──────────────────────────────────────────────────┘

Three properties define the architecture:

  1. File provider only — no Docker label discovery, no Kubernetes CRDs. Routers and services are declared statically in templated YAML rendered by Ansible.
  2. DNS-01 ACME — certificates are issued via Cloudflare's API, so Traefik does not need to be reachable from the public internet to renew. Wildcards (*.home.helix9.org) are possible with no firewall holes.
  3. Split-horizon DNS*.home.helix9.org resolves to 10.69.20.40 for clients inside the network (via Technitium static mappings), and to the public VPS IP outside. Same Traefik handles both.

Infrastructure

LXC Container

SettingValue
Nodepve02
IP10.69.20.40/24
Gateway10.69.20.1
CPU2 cores
RAM1024 MB
Swap512 MB
Disk8 GB
Templaterockylinux-10-default_20251001_amd64
OS familyRedHat (uses dnf + firewalld)
Unprivilegedyes
Tagsservers, traefik, ingress

Provisioned via Terraform from inventory/host_vars/traefik/vars.yml. See New Host for the standard LXC-creation workflow.

Ansible

ItemPath
Roleroles/traefik/
Playbookplaybooks/traefik.yml
Host varsinventory/host_vars/traefik/vars.yml
Vault keysvault_traefik_acme_email, vault_traefik_cf_dns_token

Deploy:

ansible-playbook playbooks/traefik.yml

The role is idempotent. It downloads the pinned Traefik binary only when the installed version does not match traefik_version, then renders three template files and reloads the systemd unit.


Installation Layout

PathPurpose
/usr/local/bin/traefikBinary (capability cap_net_bind_service=+ep)
/etc/traefik/traefik.ymlStatic config (rendered)
/etc/traefik/dynamic/dynamic.ymlMiddlewares + TLS options (rendered)
/etc/traefik/dynamic/services.ymlRouters + backends (rendered)
/etc/traefik/acme/acme.jsonACME state (mode 0600, owned by traefik)
/var/log/traefik/traefik.logDaemon log
/var/log/traefik/access.logHTTP access log
/etc/systemd/system/traefik.serviceHardened unit

The binary is run as the unprivileged traefik user with cap_net_bind_service so it can bind 80/443 without root. The systemd unit applies NoNewPrivileges, ProtectSystem=full, ProtectHome, PrivateTmp, and PrivateDevices. Only the ACME and log directories are writable.


Static Configuration

Rendered from roles/traefik/templates/traefik.yml.j2.

Entrypoints

NameAddressBehaviour
web:80Permanent redirect to websecure (HTTPS)
websecure:443Primary TLS entrypoint
metrics:8082Prometheus scrape (internal-only)

Providers

Only the file provider is enabled, watching /etc/traefik/dynamic/. watch: false — config reloads happen via Ansible handler (Restart traefik), not file inotify. This is deliberate: changes go through Git → Ansible, not hand-edits.

Certificate Resolver

SettingValue
Resolver nameletsencrypt
ChallengeDNS-01
DNS providercloudflare
Storage/etc/traefik/acme/acme.json
API resolvers1.1.1.1:53, 9.9.9.9:53

The Cloudflare API token is injected at runtime via systemd Environment=CF_DNS_API_TOKEN=..., sourced from the Ansible vault key vault_traefik_cf_dns_token. The token requires only Zone:DNS:Edit scope on helix9.org.

Metrics & Dashboard

  • Prometheus metrics: enabled, scraped on :8082 with entry-point and service labels.
  • Dashboard: enabled, served on the api@internal service, exposed only via the traefik-dashboard router (see below) — never on :8080 insecurely.

Dynamic — Middlewares & TLS

Rendered from roles/traefik/templates/dynamic.yml.j2.

Middlewares

NameTypePurpose
secure-headersheadersHSTS 1y + preload, no-sniff, XSS filter, strict-origin referrer
local-onlyipAllowListPermit 10.69.0.0/16 and 192.168.178.0/24 only
authentikforwardAuthDelegate auth to Authentik outpost on :9000/outpost.goauthentik.io/auth/traefik
local-or-authchainlocal-only first, then authentik — used for the dashboard

The authentik middleware passes a fixed set of X-authentik-* headers (username, groups, email, name, uid, JWT, meta) downstream so backends can read identity claims without re-authenticating.

TLS options (default)

SettingValue
minVersionVersionTLS12
sniStricttrue
Cipher suitesTLS 1.3 AEAD suites + ECDHE-ECDSA / ECDHE-RSA AES-256-GCM

sniStrict: true rejects connections that don't present an SNI matching a known router — limits scanner noise.


Routers & Services

Rendered from roles/traefik/templates/services.yml.j2. Hostnames mostly resolve from Ansible inventory hostvars[<name>].ansible_host, so service IPs follow inventory/hosts.yml. A few are hardcoded (e.g. mediastack at 10.69.20.6x) because those hosts are not yet in inventory.

Routing rule patterns

There are five exposure patterns in use. They are not interchangeable — each has a different security posture:

PatternHowWhere used
PublicBare Host() rule, no middlewarepaperless, pulse, onedev, docusaurus, dns, uptimekuma, pbs
Public + AuthentikHost() + middlewares: [authentik]proxmox-pve01/02, copyparty, openclaw
LAN-only via router ruleHost() AND ClientIP(...) in the same ruleseerr, sonarr, jellyfin-local
LAN-only via middlewareHost() + middlewares: [local-only]radarr, sabnzbd
LAN bypass + external AuthentikTwo routers, different prioritiesjellyfin

The two LAN-only patterns are functionally similar but differ in failure mode:

  • Router rule (ClientIP) — non-matching clients get 404 Not Found. The router doesn't even match.
  • Middleware (local-only) — non-matching clients hit the router but get 403 Forbidden. The router matches, the middleware rejects.

The 404 form is preferable because it leaks less information about which services exist. Prefer ClientIP in the router rule for any new LAN-only entry.

Dual-router pattern (Jellyfin)

Two routers share the same Host() and differ on priority:

jellyfin-local:
rule: "Host(`jellyfin.home.helix9.org`) && (ClientIP(`10.69.0.0/16`) || ClientIP(`192.168.178.0/24`))"
priority: 100
# no middleware

jellyfin:
rule: "Host(`jellyfin.home.helix9.org`)"
priority: 50
middlewares: [authentik]

Higher priority + more specific match wins for LAN clients (no Authentik prompt — the Jellyfin app handles its own auth). External clients fall through to the lower-priority router and are gated by Authentik.

Path-priority pattern (OpenClaw)

openclaw-api:
rule: "Host(`openclaw.helix9.org`) && PathPrefix(`/__openclaw__/`)"
priority: 200
# no middleware — API path

openclaw:
rule: "Host(`openclaw.helix9.org`)"
priority: 100
middlewares: [authentik]

The API prefix bypasses Authentik (machine-to-machine), the rest of the host requires it.

Authentik outpost — priority 1000

authentik-outpost:
rule: "PathPrefix(`/outpost.goauthentik.io/`)"
priority: 1000

This must outrank every per-service router. Any host with the Authentik forward-auth middleware redirects unauthenticated users to /outpost.goauthentik.io/... on the same host — and the outpost router needs to claim that path on every host. Hence the very high priority and the lack of Host() constraint.

Proxmox transport

Proxmox uses self-signed certificates on :8006 (PVE) and :8007 (PBS). A dedicated serversTransport named proxmox-transport sets insecureSkipVerify: true for those backends only — no other backend opts into TLS skip.

serversTransports:
proxmox-transport:
insecureSkipVerify: true

Current router inventory

Note: the dns router pointing at technitium:5380 is currently inactive. The Technitium LXC at 10.69.20.53 is still running but is not in the live DNS path — VyOS forwards directly to upstream resolvers. The router stays in services.yml.j2 so the path is one playbook run away from being live again, but treat the entry as dormant.

RouterHostBackendMiddleware
paperlesspaperless.home.helix9.orgpaperless:8000
pulsepulse.home.helix9.orgpulse:7655
uptimekumastatus.home.helix9.orguptime-kuma:3001
onedevonedev.home.helix9.orgonedev:6610
docusaurusdocs.home.helix9.orgdocusaurus:8080
dns (inactive)dns.home.helix9.orgtechnitium:5380
pbspbs01.home.helix9.orgpbs01:8007 (proxmox-transport)
proxmox-pve01pve01.home.helix9.org10.69.10.5:8006 (proxmox-transport)authentik
proxmox-pve02pve02.home.helix9.orgpve02:8006 (proxmox-transport)authentik
authentikauth.home.helix9.orgauthentik:9000
authentik-outpostPathPrefix(/outpost.goauthentik.io/)authentik:9000
copypartycopyparty.home.helix9.orgcopyparty:3923authentik
openclawopenclaw.helix9.orgopenclaw:18789authentik
openclaw-apiopenclaw.helix9.org/__openclaw__/openclaw:18789
jellyfin-localjellyfin.home.helix9.org (LAN)10.69.20.64:8096
jellyfinjellyfin.home.helix9.org (external)10.69.20.64:8096authentik
seerr-localseerr.home.helix9.org (LAN)10.69.20.65:5055
sonarr-localsonarr.home.helix9.org (LAN)10.69.20.62:8989
radarrradarr.home.helix9.org10.69.20.63:7878local-only
sabnzbdsabnzbd.home.helix9.org10.69.20.61:8080local-only
traefik-dashboardtraefik.home.helix9.orgapi@internallocal-or-auth

Adding a New Service

The end-to-end checklist for exposing a new backend (e.g. foo.home.helix9.org10.69.20.99:8000):

  1. Inventory — add the host to inventory/hosts.yml with the right ansible_host IP. Run ansible-playbook playbooks/vyos_dns.yml to push the shortname → IP mapping into VyOS.
  2. DNS — add foo.home.helix9.org10.69.20.40 to the static-host mappings on VyOS (currently hand-curated; see Home Router — Static Host Mappings). External access also needs a Cloudflare record pointing to the VPS public IP.
  3. Firewall — confirm the SERVERS zone reaches the backend port. For non-SERVERS backends (DMZ, MGMT) add a specific rule in the relevant cross-zone policy on VyOS.
  4. Traefik router — append a router + service to roles/traefik/templates/services.yml.j2. Pick the routing pattern from the table above. For LAN-only, prefer ClientIP in the router rule.
  5. Deployansible-playbook playbooks/traefik.yml. The handler restarts Traefik; the certificate is requested on first hit.
  6. Verifycurl -I https://foo.home.helix9.org from a TRUSTED client; check journalctl -u traefik for ACME success. The acme.json file should grow on disk.

If the service does its own SSO (Authelia-style backends, Authentik itself), skip the authentik middleware.


ACME Certificates

Storage

Certificates and account keys live in /etc/traefik/acme/acme.json (0600, owned by traefik). This is the single most important file on the host — losing it triggers a full re-issue on next request, which is fine, but rate-limited by Let's Encrypt (50 certs / week / domain).

Back it up. The role supports seeding a fresh install from a controller-side copy via traefik_acme_seed_src:

# in host_vars/traefik/vars.yml or extra-vars
traefik_acme_seed_src: "/path/to/acme.json"
traefik_acme_seed_force: false # true to overwrite existing

This was used to migrate existing certs from the old podman host without re-issuing.

Renewal

Traefik renews automatically 30 days before expiry. No cron, no hooks. The DNS-01 challenge runs every renewal — the Cloudflare API token must remain valid.

DNS-01 mechanics

For each cert request:

  1. Traefik calls Cloudflare API to create a _acme-challenge.<host> TXT record.
  2. Lets Encrypt's CA queries that TXT record from authoritative Cloudflare nameservers.
  3. Traefik deletes the TXT record on success.

The internal resolvers (1.1.1.1, 9.9.9.9) are used by Traefik to verify the TXT record's propagation before notifying Let's Encrypt — they need to remain reachable from the SERVERS zone.

Wildcard certs

Currently every service has its own cert. To switch to a wildcard *.home.helix9.org, add a tls.domains block to one router and Traefik will request the wildcard once, then reuse it. Worth doing if the service count grows past ~30 (rate-limit headroom).


Authentik Integration

Forward-auth is configured at the middleware level (see dynamic.yml.j2). The flow:

  1. Client requests https://copyparty.home.helix9.org.
  2. Traefik calls http://authentik:9000/outpost.goauthentik.io/auth/traefik with the original request headers.
  3. If the user has a valid Authentik session cookie, the outpost returns 200 and the request continues to copyparty with X-authentik-* identity headers.
  4. If not, the outpost returns 302 to /outpost.goauthentik.io/start?rd=<original-url>.
  5. The high-priority authentik-outpost router (priority 1000) catches /outpost.goauthentik.io/* on every host and proxies to the same outpost backend, which renders the login UI.
  6. After login, the outpost redirects back to the original URL.

Backends that read the identity headers (e.g. X-authentik-email) get user info without doing their own auth. Backends that don't are still gated — the request only reaches them after the outpost approves.

Common pitfall: if you forget to add the Authentik middleware to a new router, the service is publicly accessible at *.home.helix9.org (which resolves externally via the VPS). LAN-only services should use local-only instead, never rely on "nobody knows the URL".


Operations

Reload / restart

systemctl reload traefik # SIGUSR1 — log rotation only
systemctl restart traefik # full restart, ACME state preserved

Config changes require a full restart (watch: false). The Ansible handler does this automatically.

Logs

StreamPath
Daemon/var/log/traefik/traefik.log
Access/var/log/traefik/access.log
systemdjournalctl -u traefik -f

Log level is INFO. Bump to DEBUG for ACME troubleshooting only (very noisy).

Health checks

curl -sk https://localhost/ -H "Host: traefik.home.helix9.org" # dashboard ping
curl -s http://localhost:8082/metrics | head # Prometheus metrics
ss -tlnp | grep traefik # 80, 443, 8082 bound

Firewall (host)

Rocky's firewalld opens 80 and 443 only. The Prometheus endpoint on :8082 is only reachable from the local container — Pulse / Prometheus would need an explicit firewalld rule to scrape it, currently not opened.

Dashboard access

traefik.home.helix9.org is gated by the local-or-auth chain: any LAN client (10.69.0.0/16, 192.168.178.0/24) gets straight through; anything else must authenticate via Authentik.


Known Gotchas

  • watch: false — editing /etc/traefik/dynamic/*.yml directly does nothing until restart. This is intentional, but surprising the first time. Always go through Ansible.
  • pve01 hardcodedtraefik_pve01_ip: 10.69.10.5 is in defaults/main.yml because pve01 is not yet in inventory. Move to hostvars[pve01].ansible_host once it is.
  • Mediastack hardcoded IPs — Jellyfin, Sonarr, Radarr, Sabnzbd, Seerr use literal 10.69.20.6x addresses. Same fix: add them to inventory and switch to hostvars[].ansible_host.
  • Cloudflare token scope — must be Zone:DNS:Edit on helix9.org, not the global API key. Rotating the token requires re-running the playbook (the env var is read at process start).
  • Outpost priority — if you ever add a router with priority: >= 1000, double-check it doesn't shadow the authentik-outpost PathPrefix matcher. Login flows will silently break.
  • sniStrict — clients that don't send SNI (rare, ancient clients, some monitoring probes) get a TLS error before any router matches. Disable per-router only if you have a known offender.
  • ACME storage permissions — if acme.json becomes group-readable for any reason, Traefik refuses to start. Re-run the role to fix mode.
  • Restart drops connections — Traefik is not graceful on restart; in-flight long-poll connections (Home Assistant, Synapse) will drop. Restart during low-traffic windows or accept brief blips.