Skip to main content

MatrixRTC Backend (Element Call)

Projects: LiveKit · lk-jwt-service Host: matrixrtc / 10.69.70.31 (LXC 731, DMZ / VLAN 70) OS: Rocky Linux 10 (LXC, unprivileged, keyctl=1) Public URL: https://matrix-rtc.helix9.org Source role: roles/matrixrtc/ in the Ansible repo

Overview

Backend for Element Call (group + 1:1 voice/video) on matrix.helix9.org. Two podman Quadlet containers:

  • LiveKit SFU — the WebRTC media server (selective forwarding unit).
  • lk-jwt-service (MatrixRTC Authorization Service) — validates a caller's Matrix OpenID token against the homeserver and issues a LiveKit JWT scoped to the room.

Clients discover this backend via the org.matrix.msc4143.rtc_foci key in Synapse's .well-known/matrix/client. The call UI itself is the embedded call.element.io SPA — it is not self-hosted; only the backend is.

Architecture

┌─ wss 443 ─ Traefik ─ Host(matrix-rtc.helix9.org) ─┐
client ─────────┤ /livekit/jwt/* → lk-jwt-service :8080 (StripPrefix)
│ /livekit/sfu/* → livekit :7880 (StripPrefix, ws)
└───────────────────────────────────────────────────┘
client ── UDP 50000-60000 ─► home line 79.246.151.97 (pppoe0 DNAT) ─► livekit (DIRECT, bypasses VPS+Traefik)
client ── TCP 7881 (ICE fallback) ─► home line ─► livekit

lk-jwt-service ── OpenID validation ──► matrix.helix9.org (federation API)

Asymmetric path (intentional):

  • Signaling (wss + JWT) enters via the VPS 152.53.173.192 → IPsec → Traefik (matrix-rtc.helix9.org resolves to the VPS, like all other ingress).
  • Media (RTP/UDP) flows directly over the home line pppoe0 (79.246.151.97), not through the VPS. matrixrtc's outbound default route is the home line, so LiveKit's use_external_ip: true auto-detects 79.246.151.97 and advertises it as the ICE candidate — which is exactly where the home-line DNAT delivers media. Keeps real-time video off the VPS/IPsec tunnel (better latency, no VPS bandwidth cost, no MTU 1400 issue).

⚠️ Dynamic-IP caveat: the home line IP is dynamic. On a reconnect/IP change LiveKit keeps advertising the old IP until restarted → media breaks (signaling via the static VPS keeps working). A nightly livekit-restart.timer (05:00, deployed by the role) restarts LiveKit so use_external_ip re-detects the current IP, covering Telekom's overnight forced reconnect. For an off-cycle change, restart manually: ssh matrixrtc 'systemctl restart livekit'. NAT/firewall need no change — rules key on inbound-interface pppoe0, not the IP.

Containers (Quadlets)

UnitImageListensNotes
livekit.servicedocker.io/livekit/livekit-server:latest:7880 (ws), :7881/tcp, 50000-60000/udpConfig /opt/matrixrtc/livekit.yaml; use_external_ip: true
lk-jwt-service.serviceghcr.io/element-hq/lk-jwt-service:latest:8080Requires=livekit.service; env-configured (see below)

Host network; root podman; AutoUpdate=registry.

Systemd timer — livekit-restart.timer

A plain systemd unit (not a quadlet), deployed by the role to /etc/systemd/system/:

UnitScheduleAction
livekit-restart.timerOnCalendar=*-*-* 05:00:00 (Europe/Berlin), Persistent=truetriggers livekit-restart.service
livekit-restart.serviceoneshotsystemctl try-restart livekit.service

Purpose: LiveKit's use_external_ip detects the public IP once at startup. The home line (pppoe0) is dynamic, so after an overnight Telekom reconnect LiveKit would otherwise keep advertising a stale ICE candidate. The nightly restart re-detects the current IP. Schedule is configurable via matrixrtc_livekit_restart_oncalendar in the role defaults. A restart drops any in-progress call (clients auto-reconnect) — 05:00 avoids active use, and it's a no-op for media if the IP hasn't changed.

lk-jwt-service env

VarValue
LIVEKIT_URLwss://matrix-rtc.helix9.org/livekit/sfu
LIVEKIT_KEY / LIVEKIT_SECRETfrom vault (vault_livekit_api_*)
LIVEKIT_JWT_BIND:8080
LIVEKIT_FULL_ACCESS_HOMESERVERSmatrix.helix9.org

Traefik configuration

roles/traefik/templates/services.yml.j2 + dynamic.yml.j2:

  • Routers (Host matrix-rtc.helix9.org, entrypoint websecure):
    • matrix-rtc-jwtPathPrefix(/livekit/jwt) → service lk-jwt, middleware matrix-rtc-jwt-strip
    • matrix-rtc-sfuPathPrefix(/livekit/sfu) → service livekit, middleware matrix-rtc-sfu-strip
  • StripPrefix middlewares remove /livekit/jwt and /livekit/sfu (upstreams expect root paths).
  • Services: lk-jwt10.69.70.31:8080, livekit10.69.70.31:7880.
  • TLS via existing letsencrypt resolver (Cloudflare DNS-01).

Synapse changes

  1. .well-known/matrix/client (roles/synapse/templates/element-nginx.conf.j2) now advertises the focus:
    {"m.homeserver": {"base_url": "https://matrix.helix9.org"},
    "org.matrix.msc4143.rtc_foci": [
    {"type": "livekit", "livekit_service_url": "https://matrix-rtc.helix9.org/livekit/jwt"}]}
  2. homeserver.yaml (hand-edited, not templated — lives in /opt/matrix/synapse-data/):
    experimental_features:
    msc3266_enabled: true
    msc4222_enabled: true
    max_event_delay_duration: 24h
    rc_message: { per_second: 0.5, burst_count: 30 }
    rc_delayed_event_mgmt: { per_second: 1, burst_count: 20 }
    Then ssh synapse 'systemctl restart synapse'.

Firewall / NAT

Media reaches the LXC direct over the home line (pppoe0), not the VPS. Three places:

1. vyos-fw — destination NAT (pppoe0 inbound):

set nat destination rule 200 inbound-interface name 'pppoe0'
set nat destination rule 200 protocol 'udp'
set nat destination rule 200 destination port '50000-60000'
set nat destination rule 200 translation address '10.69.70.31'
set nat destination rule 210 inbound-interface name 'pppoe0'
set nat destination rule 210 protocol 'tcp'
set nat destination rule 210 destination port '7881'
set nat destination rule 210 translation address '10.69.70.31'

2. vyos-fw — WAN-DMZ zone ruleset (filter sees the DNAT'd dest 10.69.70.31):

set firewall ipv4 name WAN-DMZ rule 200 action 'accept'
set firewall ipv4 name WAN-DMZ rule 200 destination address '10.69.70.31'
set firewall ipv4 name WAN-DMZ rule 200 destination port '50000-60000'
set firewall ipv4 name WAN-DMZ rule 200 protocol 'udp'
set firewall ipv4 name WAN-DMZ rule 210 action 'accept'
set firewall ipv4 name WAN-DMZ rule 210 destination address '10.69.70.31'
set firewall ipv4 name WAN-DMZ rule 210 destination port '7881'
set firewall ipv4 name WAN-DMZ rule 210 protocol 'tcp'

3. vyos-fw — SERVERS-SCAN rule 260 — Traefik → matrixrtc signaling (8080,7880/tcp), see Home Router.

Signaling (443) needs no new NAT — the VPS already forwards all :443 → Traefik, SNI-routed. Public DNS: matrix-rtc.helix9.org A → 152.53.173.192 (VPS), DNS-only in Cloudflare.

Ansible

ansible-playbook playbooks/matrixrtc.yml

Secrets in inventory/group_vars/all/vault.yml: vault_livekit_api_key, vault_livekit_api_secret. Defaults in roles/matrixrtc/defaults/main.yml.

Manual operations

TaskCommand
Tail LiveKitssh matrixrtc 'journalctl -fu livekit'
Tail JWT servicessh matrixrtc 'journalctl -fu lk-jwt-service'
JWT healthcheckcurl https://matrix-rtc.helix9.org/livekit/jwt/healthz
Verify focicurl https://matrix.helix9.org/.well-known/matrix/client
Force IP re-detectssh matrixrtc 'systemctl restart livekit'
Check nightly timerssh matrixrtc 'systemctl status livekit-restart.timer' (want active (waiting) + next Trigger:)
Confirm advertised IPssh matrixrtc 'podman logs livekit 2>&1 | grep -i candidate | tail -1'

Troubleshooting

Call connects but no audio/video

UDP media not reaching the LXC, or LiveKit advertising the wrong IP. Confirm the pppoe0 DNAT + WAN-DMZ rules above, and check the candidate LiveKit hands out:

podman logs livekit 2>&1 | grep -i "candidate"
# want: [selected] udp4 host <home-line-IP>:5000x (matches pppoe0's current IP)

If the home line IP changed since LiveKit started, it'll advertise a stale IP → systemctl restart livekit.

"Failed to join call" / token errors

lk-jwt-service can't validate the OpenID token. Check it can reach matrix.helix9.org federation API, and LIVEKIT_FULL_ACCESS_HOMESERVERS matches the server name. journalctl -u lk-jwt-service -n 50.

Call button missing in Element

.well-known not advertising rtc_foci, or Synapse missing msc3266/msc4222. Re-check both Synapse changes above.