MatrixRTC Backend (Element Call)
Projects: LiveKit · lk-jwt-service
Host: matrixrtc / 10.69.70.31 (LXC 731, DMZ / VLAN 70)
OS: Rocky Linux 10 (LXC, unprivileged, keyctl=1)
Public URL: https://matrix-rtc.helix9.org
Source role: roles/matrixrtc/ in the Ansible repo
Overview
Backend for Element Call (group + 1:1 voice/video) on matrix.helix9.org. Two
podman Quadlet containers:
- LiveKit SFU — the WebRTC media server (selective forwarding unit).
- lk-jwt-service (MatrixRTC Authorization Service) — validates a caller's Matrix OpenID token against the homeserver and issues a LiveKit JWT scoped to the room.
Clients discover this backend via the org.matrix.msc4143.rtc_foci key in Synapse's
.well-known/matrix/client. The call UI itself is the embedded call.element.io
SPA — it is not self-hosted; only the backend is.
Architecture
┌─ wss 443 ─ Traefik ─ Host(matrix-rtc.helix9.org) ─┐
client ─────────┤ /livekit/jwt/* → lk-jwt-service :8080 (StripPrefix)
│ /livekit/sfu/* → livekit :7880 (StripPrefix, ws)
└───────────────────────────────────────────────────┘
client ── UDP 50000-60000 ─► home line 79.246.151.97 (pppoe0 DNAT) ─► livekit (DIRECT, bypasses VPS+Traefik)
client ── TCP 7881 (ICE fallback) ─► home line ─► livekit
lk-jwt-service ── OpenID validation ──► matrix.helix9.org (federation API)
Asymmetric path (intentional):
- Signaling (wss + JWT) enters via the VPS
152.53.173.192→ IPsec → Traefik (matrix-rtc.helix9.orgresolves to the VPS, like all other ingress). - Media (RTP/UDP) flows directly over the home line
pppoe0(79.246.151.97), not through the VPS. matrixrtc's outbound default route is the home line, so LiveKit'suse_external_ip: trueauto-detects79.246.151.97and advertises it as the ICE candidate — which is exactly where the home-line DNAT delivers media. Keeps real-time video off the VPS/IPsec tunnel (better latency, no VPS bandwidth cost, no MTU 1400 issue).
⚠️ Dynamic-IP caveat: the home line IP is dynamic. On a reconnect/IP change LiveKit keeps advertising the old IP until restarted → media breaks (signaling via the static VPS keeps working). A nightly
livekit-restart.timer(05:00, deployed by the role) restarts LiveKit souse_external_ipre-detects the current IP, covering Telekom's overnight forced reconnect. For an off-cycle change, restart manually:ssh matrixrtc 'systemctl restart livekit'. NAT/firewall need no change — rules key oninbound-interface pppoe0, not the IP.
Containers (Quadlets)
| Unit | Image | Listens | Notes |
|---|---|---|---|
livekit.service | docker.io/livekit/livekit-server:latest | :7880 (ws), :7881/tcp, 50000-60000/udp | Config /opt/matrixrtc/livekit.yaml; use_external_ip: true |
lk-jwt-service.service | ghcr.io/element-hq/lk-jwt-service:latest | :8080 | Requires=livekit.service; env-configured (see below) |
Host network; root podman; AutoUpdate=registry.
Systemd timer — livekit-restart.timer
A plain systemd unit (not a quadlet), deployed by the role to /etc/systemd/system/:
| Unit | Schedule | Action |
|---|---|---|
livekit-restart.timer | OnCalendar=*-*-* 05:00:00 (Europe/Berlin), Persistent=true | triggers livekit-restart.service |
livekit-restart.service | oneshot | systemctl try-restart livekit.service |
Purpose: LiveKit's use_external_ip detects the public IP once at startup. The home
line (pppoe0) is dynamic, so after an overnight Telekom reconnect LiveKit would otherwise
keep advertising a stale ICE candidate. The nightly restart re-detects the current IP.
Schedule is configurable via matrixrtc_livekit_restart_oncalendar in the role defaults.
A restart drops any in-progress call (clients auto-reconnect) — 05:00 avoids active use,
and it's a no-op for media if the IP hasn't changed.
lk-jwt-service env
| Var | Value |
|---|---|
LIVEKIT_URL | wss://matrix-rtc.helix9.org/livekit/sfu |
LIVEKIT_KEY / LIVEKIT_SECRET | from vault (vault_livekit_api_*) |
LIVEKIT_JWT_BIND | :8080 |
LIVEKIT_FULL_ACCESS_HOMESERVERS | matrix.helix9.org |
Traefik configuration
roles/traefik/templates/services.yml.j2 + dynamic.yml.j2:
- Routers (Host
matrix-rtc.helix9.org, entrypointwebsecure):matrix-rtc-jwt—PathPrefix(/livekit/jwt)→ servicelk-jwt, middlewarematrix-rtc-jwt-stripmatrix-rtc-sfu—PathPrefix(/livekit/sfu)→ servicelivekit, middlewarematrix-rtc-sfu-strip
- StripPrefix middlewares remove
/livekit/jwtand/livekit/sfu(upstreams expect root paths). - Services:
lk-jwt→10.69.70.31:8080,livekit→10.69.70.31:7880. - TLS via existing
letsencryptresolver (Cloudflare DNS-01).
Synapse changes
.well-known/matrix/client(roles/synapse/templates/element-nginx.conf.j2) now advertises the focus:{"m.homeserver": {"base_url": "https://matrix.helix9.org"},"org.matrix.msc4143.rtc_foci": [{"type": "livekit", "livekit_service_url": "https://matrix-rtc.helix9.org/livekit/jwt"}]}homeserver.yaml(hand-edited, not templated — lives in/opt/matrix/synapse-data/):Thenexperimental_features:msc3266_enabled: truemsc4222_enabled: truemax_event_delay_duration: 24hrc_message: { per_second: 0.5, burst_count: 30 }rc_delayed_event_mgmt: { per_second: 1, burst_count: 20 }ssh synapse 'systemctl restart synapse'.
Firewall / NAT
Media reaches the LXC direct over the home line (pppoe0), not the VPS. Three places:
1. vyos-fw — destination NAT (pppoe0 inbound):
set nat destination rule 200 inbound-interface name 'pppoe0'
set nat destination rule 200 protocol 'udp'
set nat destination rule 200 destination port '50000-60000'
set nat destination rule 200 translation address '10.69.70.31'
set nat destination rule 210 inbound-interface name 'pppoe0'
set nat destination rule 210 protocol 'tcp'
set nat destination rule 210 destination port '7881'
set nat destination rule 210 translation address '10.69.70.31'
2. vyos-fw — WAN-DMZ zone ruleset (filter sees the DNAT'd dest 10.69.70.31):
set firewall ipv4 name WAN-DMZ rule 200 action 'accept'
set firewall ipv4 name WAN-DMZ rule 200 destination address '10.69.70.31'
set firewall ipv4 name WAN-DMZ rule 200 destination port '50000-60000'
set firewall ipv4 name WAN-DMZ rule 200 protocol 'udp'
set firewall ipv4 name WAN-DMZ rule 210 action 'accept'
set firewall ipv4 name WAN-DMZ rule 210 destination address '10.69.70.31'
set firewall ipv4 name WAN-DMZ rule 210 destination port '7881'
set firewall ipv4 name WAN-DMZ rule 210 protocol 'tcp'
3. vyos-fw — SERVERS-SCAN rule 260 — Traefik → matrixrtc signaling (8080,7880/tcp),
see Home Router.
Signaling (443) needs no new NAT — the VPS already forwards all :443 → Traefik, SNI-routed.
Public DNS: matrix-rtc.helix9.org A → 152.53.173.192 (VPS), DNS-only in Cloudflare.
Ansible
ansible-playbook playbooks/matrixrtc.yml
Secrets in inventory/group_vars/all/vault.yml: vault_livekit_api_key,
vault_livekit_api_secret. Defaults in roles/matrixrtc/defaults/main.yml.
Manual operations
| Task | Command |
|---|---|
| Tail LiveKit | ssh matrixrtc 'journalctl -fu livekit' |
| Tail JWT service | ssh matrixrtc 'journalctl -fu lk-jwt-service' |
| JWT healthcheck | curl https://matrix-rtc.helix9.org/livekit/jwt/healthz |
| Verify foci | curl https://matrix.helix9.org/.well-known/matrix/client |
| Force IP re-detect | ssh matrixrtc 'systemctl restart livekit' |
| Check nightly timer | ssh matrixrtc 'systemctl status livekit-restart.timer' (want active (waiting) + next Trigger:) |
| Confirm advertised IP | ssh matrixrtc 'podman logs livekit 2>&1 | grep -i candidate | tail -1' |
Troubleshooting
Call connects but no audio/video
UDP media not reaching the LXC, or LiveKit advertising the wrong IP. Confirm the
pppoe0 DNAT + WAN-DMZ rules above, and check the candidate LiveKit hands out:
podman logs livekit 2>&1 | grep -i "candidate"
# want: [selected] udp4 host <home-line-IP>:5000x (matches pppoe0's current IP)
If the home line IP changed since LiveKit started, it'll advertise a stale IP →
systemctl restart livekit.
"Failed to join call" / token errors
lk-jwt-service can't validate the OpenID token. Check it can reach
matrix.helix9.org federation API, and LIVEKIT_FULL_ACCESS_HOMESERVERS matches the
server name. journalctl -u lk-jwt-service -n 50.
Call button missing in Element
.well-known not advertising rtc_foci, or Synapse missing msc3266/msc4222.
Re-check both Synapse changes above.