LTE Failover — Teltonika RUT240
Secondary internet uplink via a Teltonika RUT240 LTE router, with automatic failover when the primary DSL link (pppoe0 over eth1 → Speedport in bridge mode) fails. Implemented on vyos-fw (VyOS 2026.03 circinus) using a custom systemd watchdog after both load-balancing wan and protocols failover route were found broken on this release.
Deployed 2026-05-11.
Table of Contents
- Overview
- Topology
- RUT240 Configuration
- Switch — VLAN 99 (Juniper EX)
- VyOS Configuration
- IPsec Caveat
- Health Check & Monitoring
- Verification
- Known Limitations
- Out of Scope / Future Work
Overview
Goal: Outbound LAN connectivity (browsing, updates, DNS forwarding to Quad9, NTP, alerting) survives a DSL outage without manual intervention.
Non-goal: Inbound services published via the VPS (Matrix, Mumble, Minecraft, OpenClaw, etc.) — these go through the IPsec tunnel anchored to the DSL public IP. The tunnel will be down during LTE-only operation (see IPsec Caveat).
Key design choices:
| Decision | Choice | Reason |
|---|---|---|
| Physical uplink | New VLAN 99 on existing eth0 trunk → Juniper EX access port → RUT240 LAN | All five NICs on vyos-fw already used; no recabling. |
| RUT240 mode | NAT router (its own LAN DHCP) | Simplest; consumer LTE SIM is CGNAT anyway, no benefit from bridge mode. |
| Failover engine | Custom systemd timer + bash watchdog managing kernel default route | Native VyOS load-balancing wan and protocols failover route both broken on circinus for this topology (see Known broken paths). |
| IPsec handling | Best-effort (drop during failover) | Avoids fragile DDNS / local-address any complexity; revisit later. |
Topology
Internet (DSL/PPPoE) Internet (LTE, CGNAT)
│ │
Speedport (bridge/modem mode) RUT240 (LTE→NAT, LAN 192.168.99.1/24)
│ │
│ eth1 (PPPoE → pppoe0) │ ge-0/0/6 (access, untagged VLAN 99)
│ ▼
│ ┌──────────────────────┐
│ │ Juniper EX core-sw │
│ │ VLAN 99 = LTE │
│ └──────────┬───────────┘
│ │ ge-0/0/0 trunk (VLAN 99 tagged)
┌──────┴──────────────────────────────────┴─────────────┐
│ vyos-fw │
│ WAN zone: pppoe0 + eth0.99 │
│ Watchdog: pings 1.1.1.1/9.9.9.9 bound to pppoe0; │
│ installs/removes LTE default via eth0.99 │
│ Source NAT: masquerade on both │
└────────────────────────────────────────────────────────┘
RUT240 Configuration
Hardware: Teltonika RUT240, firmware RUT2XX_R_00.01.11.2. Carrier: Telekom.de (4G).
Factory reset
Hold the reset button for ~10 seconds until all LEDs flash simultaneously, then release. Wait for reboot. Default LAN comes back up as 192.168.1.1/24 with DHCP — plug a laptop in and browse to http://192.168.1.1 (admin / admin01, will prompt for password change).
Setup wizard
| Section | Setting | Value |
|---|---|---|
| Mobile | APN | Auto (Telekom detected) |
| PIN | none | |
| Dial number | *99# | |
| MTU | 1500 | |
| Service mode | Automatic | |
| LAN | IP | 192.168.99.1/24 (change from default 192.168.1.1 here, or later in Network → LAN) |
| DHCP server | enabled, pool start 100, limit 50, lease 12h | |
| WiFi | both bands | disabled |
| RMS (Teltonika cloud) | disabled | |
| WAN failover (RUT240 internal) | enabled |
Confirm LTE uplink works by plugging a laptop into RUT240 LAN1 and browsing.
Switch — VLAN 99 (Juniper EX)
On core-sw (Juniper EX), in configuration mode:
set vlans LTE vlan-id 99
set interfaces ge-0/0/6 description "Teltonika RUT240"
set interfaces ge-0/0/6 unit 0 family ethernet-switching interface-mode access
set interfaces ge-0/0/6 unit 0 family ethernet-switching vlan members LTE
set interfaces ge-0/0/0 description "TRUNK-VyOS-Router"
set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members LTE
ge-0/0/0 is the existing trunk to vyos-fw eth0; only the vlan members LTE line is new. ge-0/0/6 is the dedicated access port for the RUT240 LAN1 cable.
commit and verify with show vlans LTE — port ge-0/0/6.0 should be listed untagged, ge-0/0/0.0 tagged.
VyOS Configuration
All commands run in configure. Commit after each block, or batch and commit once at the end.
1. LTE VLAN sub-interface
set interfaces ethernet eth0 vif 99 description 'WAN-LTE (RUT240)'
set interfaces ethernet eth0 vif 99 address dhcp
set interfaces ethernet eth0 vif 99 dhcp-options no-default-route
no-default-route prevents the kernel from installing a second default via LTE; the watchdog owns that decision. Observed lease: 192.168.99.111 from the RUT240 DHCP pool.
2. Source NAT — masquerade on LTE
set nat source rule 110 outbound-interface name eth0.99
set nat source rule 110 source address '10.69.0.0/16'
set nat source rule 110 translation address masquerade
Existing rule 100 (pppoe0) stays untouched.
3. Firewall — add eth0.99 to WAN zone
set firewall zone WAN interface eth0.99
No new policies. WAN-LOCAL, WAN-DMZ, and outbound ALLOW-INTERNET apply identically to either uplink. The router itself remains protected: only established/related inbound from WAN.
4. DNS / NTP
No changes required.
- Router system DNS (
1.1.1.1,9.9.9.9) reaches over either uplink. - Technitium (
10.69.20.53) forwarding to Quad99.9.9.9:853/149.112.112.112:853(DoT) works on either path. - NTP upstreams (
time.cloudflare.com,time*.vyos.net) likewise.
Known broken paths
Two native VyOS mechanisms were tried and rejected before settling on the watchdog:
load-balancing wanis broken on circinus for PPPoE. Op-modeshow load-balancing wan [status]commands are gone. Thevyos-wan-load-balance.servicedaemon does not detectpppoe0going down —nexthopis required syntactically forpppoe0even though PPP has no real next-hop.nexthop 0.0.0.0parses but the daemon never logs state changes during a real DSL outage; the fwmark/nftables rules stay pointed at thewlb_mangle_isp_pppoe0chain and never flip to LTE. The feature is officially deprecated in VyOS.protocols failover routeis broken for multi-default scenarios. Thevyos-failoverdaemon's ICMP healthcheck does not bind to the specified interface (noSO_BINDTODEVICE); it follows the main routing table, so a check nominally "viaeth0.99" actually egresses whatever default is currently active. Working around it by aiming the check at the directly-connected192.168.99.1makes the check permanently succeed, which permanently installs the LTE route. Worse: once the failover daemon installs its kernel route (proto failover), FRR sees a kernel-distance-0 default and refuses to install its distance-1 static default forpppoe0. Result:pppoe0default disappeared from the kernel and all traffic egressed via LTE — SIM data bleed observed during testing.
Watchdog-based failover
Three files installed on vyos-fw.
/config/scripts/wan-watchdog.sh (chmod +x):
#!/bin/bash
set -u
PRIMARY_IFACE=pppoe0
BACKUP_GW=192.168.99.1
BACKUP_IFACE=eth0.99
BACKUP_METRIC=1
TARGETS=(1.1.1.1 9.9.9.9)
FAIL_THRESHOLD=3
RECOVER_THRESHOLD=3
STATE=/run/wan-watchdog.state
LOG_TAG=wan-watchdog
logger -t "$LOG_TAG" -p user.info "tick"
ok=0
for t in "${TARGETS[@]}"; do
if ping -I "$PRIMARY_IFACE" -c 1 -W 2 "$t" >/dev/null 2>&1; then
ok=1
break
fi
done
fail=0; succ=0
[[ -f $STATE ]] && source "$STATE"
if [[ $ok -eq 0 ]]; then
fail=$((fail+1)); succ=0
else
succ=$((succ+1)); fail=0
fi
lte_active=0
ip route show default | grep -q "dev $BACKUP_IFACE proto static metric $BACKUP_METRIC" && lte_active=1
if [[ $fail -ge $FAIL_THRESHOLD && $lte_active -eq 0 ]]; then
ip route add default via "$BACKUP_GW" dev "$BACKUP_IFACE" proto static metric "$BACKUP_METRIC"
logger -t "$LOG_TAG" -p user.warning "DSL down ($fail fails) — LTE default installed"
elif [[ $succ -ge $RECOVER_THRESHOLD && $lte_active -eq 1 ]]; then
ip route del default via "$BACKUP_GW" dev "$BACKUP_IFACE" proto static metric "$BACKUP_METRIC"
logger -t "$LOG_TAG" -p user.warning "DSL recovered ($succ ok) — LTE default removed"
fi
printf 'fail=%d\nsucc=%d\n' "$fail" "$succ" > "$STATE"
/etc/systemd/system/wan-watchdog.service:
[Unit]
Description=WAN failover watchdog
After=network-online.target
[Service]
Type=oneshot
ExecStart=/config/scripts/wan-watchdog.sh
/etc/systemd/system/wan-watchdog.timer:
[Unit]
Description=Run WAN watchdog every 10s
[Timer]
OnBootSec=1min
OnUnitActiveSec=10s
AccuracySec=1s
[Install]
WantedBy=timers.target
Enable:
sudo systemctl daemon-reload
sudo systemctl enable --now wan-watchdog.timer
Mechanism. PPPoE auto-installs the default route at metric 20. Every 10 s the watchdog pings 1.1.1.1, then 9.9.9.9 (short-circuits on first success), with ping -I pppoe0 to force egress on the primary. After three consecutive failed rounds, it adds a kernel default via 192.168.99.1 dev eth0.99 metric 1, preempting pppoe0. After three successful rounds in the LTE-active state, it removes that route. Detection window ≈ 30 s, recovery window ≈ 30 s.
Durability caveat. /etc/systemd/system/ is not VyOS-managed. The unit files survive reboots on the current image but are not guaranteed to survive image upgrades. See Out of Scope.
IPsec Caveat
The site-to-site tunnel peer-vps is anchored to the DSL public IP (see IPsec docs and home-router IPsec section).
During an LTE-only window:
- Strongswan loses the bound local address → IKE SAs drop → DPD
restartaction triggers reconnect attempts that will not succeed (wrong source IP, CGNAT, VPS has no DDNS update). - Outbound LAN connectivity still works (handled entirely by the watchdog-managed kernel default, independent of IPsec).
- All VPN-dependent inbound paths break: Matrix federation, Mumble, Minecraft, OpenClaw gateway, the VPS Traefik split-horizon overlay — none return until DSL is restored.
This is accepted breakage for the first iteration. See Out of Scope for the path to make IPsec survive failover.
Health Check & Monitoring
| Command | Purpose |
|---|---|
journalctl -t wan-watchdog -f | Live watchdog log (one tick per 10 s, plus state-change warnings) |
systemctl list-timers wan-watchdog.timer | Confirm timer is armed and next-fire time |
ip route show | grep default | All defaults in the main table; LTE entry present only during failover |
ip route get 1.1.1.1 | Which interface the kernel currently picks for outbound |
show interfaces ethernet eth0 vif 99 brief | LTE uplink state + DHCP lease |
monitor traffic interface eth0.99 | Live packet capture on LTE path |
Optional Uptime Kuma probes:
- TCP/ICMP to
192.168.99.1(RUT240 LAN gateway) — proves VyOS↔RUT240 path. - HTTP probe to a known external endpoint via curl on
vyos-fwbound toeth0.99— confirms LTE egress works even when DSL is up.
Verification
End-to-end test, performed on deployment day:
- Baseline (DSL active):
- From a TRUSTED LAN host:
curl -s ifconfig.me→ DSL public IP. - On
vyos-fw:ip route show | grep default→ single defaultdev pppoe0at metric 20.
- From a TRUSTED LAN host:
- Simulate DSL outage:
configureset interfaces ethernet eth1 disablecommit
- Wait ~30 s (three 10 s ticks of consecutive failure).
- Confirm failover:
ip route show | grep default→ new entrydefault via 192.168.99.1 dev eth0.99 proto static metric 1.ip route get 1.1.1.1→dev eth0.99.- From the LAN host:
curl -s ifconfig.me→ CGNAT-range LTE public IP (different from step 1). journalctl -t wan-watchdog -n 20shows theDSL down ... LTE default installedwarning.
- Revert DSL:
delete interfaces ethernet eth1 disablecommit
- After ~30 s, watchdog logs
DSL recovered ... LTE default removed;ip route get 1.1.1.1returns todev pppoe0. - Baselines for both paths:
ping -c 50 1.1.1.1and a quickspeedtest-clifrom a LAN host. Note LTE RTT and throughput — these set expectations for how degraded user experience will be during failover.
If failover flaps → raise FAIL_THRESHOLD / RECOVER_THRESHOLD in the script, or add a third ping target.
Known Limitations
- Gray-failure blind spot. Detection covers interface-down and full ICMP loss only. A DSL line that is up and routes ICMP to
1.1.1.1/9.9.9.9but is broken for DNS, TCP, or specific destinations will not trigger failover. Add an L7 healthcheck (HTTP probe bound topppoe0) if this becomes a real failure mode. - Long-lived TCP sessions reset at the moment of failover; there is no conntrack stickiness — every flow re-NATs via the new egress.
- IPsec down during LTE-only operation (see above).
- SIM data cap. A long DSL outage can burn the monthly quota fast. Set RUT240's data-warning threshold and pick a SIM plan with a sane cap.
- LTE latency (~40–80 ms vs ~10 ms DSL) — VoIP / interactive SSH degrade noticeably but stay usable.
- No IPv6 over LTE. Current
fd00:69:30::/64,fd00:69:40::/64and any global v6 traffic break during failover. Most carriers offer v6 but RUT240 + CGNAT v6 is fiddly; deferred.
Out of Scope / Future Work
- Move watchdog into a VyOS-managed location so it survives image upgrades, e.g. a
/config-resident systemd unit installed via a post-boot hook, or a proper VyOS config-mode wrapper. Current/etc/systemd/system/location is durable across reboots but not across image-rebuild upgrades. - Replace watchdog with native VyOS once a future release fixes
protocols failover routeto bind the ICMP healthcheck to the specified interface (SO_BINDTODEVICE). At that point the script becomes redundant. - IPsec survives failover. Change tunnel to
local-address any+ add DDNS records for both DSL and LTE public IPs (LTE behind CGNAT — needs IKE NAT-T from home, which works because home is the initiator). Updatepeer-vpsonvyos-edgeto accept either source. - VPS-side detection. Have
vyos-edgenotice when the tunnel is down for >N minutes and switch DNS forhome.helix9.org/ inbound published services to an "offline" sentinel. - SMS alerts via RUT240 on link-up / link-down events.
- IPv6 over LTE once carrier and RUT240 firmware cooperate.
- Granular policy routing (e.g. force backups to always egress DSL, never LTE, to protect the SIM cap).