Skip to main content

LTE Failover — Teltonika RUT240

Secondary internet uplink via a Teltonika RUT240 LTE router, with automatic failover when the primary DSL link (pppoe0 over eth1 → Speedport in bridge mode) fails. Implemented on vyos-fw (VyOS 2026.03 circinus) using a custom systemd watchdog after both load-balancing wan and protocols failover route were found broken on this release.

Deployed 2026-05-11.


Table of Contents

  1. Overview
  2. Topology
  3. RUT240 Configuration
  4. Switch — VLAN 99 (Juniper EX)
  5. VyOS Configuration
  6. IPsec Caveat
  7. Health Check & Monitoring
  8. Verification
  9. Known Limitations
  10. Out of Scope / Future Work

Overview

Goal: Outbound LAN connectivity (browsing, updates, DNS forwarding to Quad9, NTP, alerting) survives a DSL outage without manual intervention.

Non-goal: Inbound services published via the VPS (Matrix, Mumble, Minecraft, OpenClaw, etc.) — these go through the IPsec tunnel anchored to the DSL public IP. The tunnel will be down during LTE-only operation (see IPsec Caveat).

Key design choices:

DecisionChoiceReason
Physical uplinkNew VLAN 99 on existing eth0 trunk → Juniper EX access port → RUT240 LANAll five NICs on vyos-fw already used; no recabling.
RUT240 modeNAT router (its own LAN DHCP)Simplest; consumer LTE SIM is CGNAT anyway, no benefit from bridge mode.
Failover engineCustom systemd timer + bash watchdog managing kernel default routeNative VyOS load-balancing wan and protocols failover route both broken on circinus for this topology (see Known broken paths).
IPsec handlingBest-effort (drop during failover)Avoids fragile DDNS / local-address any complexity; revisit later.

Topology

Internet (DSL/PPPoE) Internet (LTE, CGNAT)
│ │
Speedport (bridge/modem mode) RUT240 (LTE→NAT, LAN 192.168.99.1/24)
│ │
│ eth1 (PPPoE → pppoe0) │ ge-0/0/6 (access, untagged VLAN 99)
│ ▼
│ ┌──────────────────────┐
│ │ Juniper EX core-sw │
│ │ VLAN 99 = LTE │
│ └──────────┬───────────┘
│ │ ge-0/0/0 trunk (VLAN 99 tagged)
┌──────┴──────────────────────────────────┴─────────────┐
│ vyos-fw │
│ WAN zone: pppoe0 + eth0.99 │
│ Watchdog: pings 1.1.1.1/9.9.9.9 bound to pppoe0; │
│ installs/removes LTE default via eth0.99 │
│ Source NAT: masquerade on both │
└────────────────────────────────────────────────────────┘

RUT240 Configuration

Hardware: Teltonika RUT240, firmware RUT2XX_R_00.01.11.2. Carrier: Telekom.de (4G).

Factory reset

Hold the reset button for ~10 seconds until all LEDs flash simultaneously, then release. Wait for reboot. Default LAN comes back up as 192.168.1.1/24 with DHCP — plug a laptop in and browse to http://192.168.1.1 (admin / admin01, will prompt for password change).

Setup wizard

SectionSettingValue
MobileAPNAuto (Telekom detected)
PINnone
Dial number*99#
MTU1500
Service modeAutomatic
LANIP192.168.99.1/24 (change from default 192.168.1.1 here, or later in Network → LAN)
DHCP serverenabled, pool start 100, limit 50, lease 12h
WiFiboth bandsdisabled
RMS (Teltonika cloud)disabled
WAN failover (RUT240 internal)enabled

Confirm LTE uplink works by plugging a laptop into RUT240 LAN1 and browsing.


Switch — VLAN 99 (Juniper EX)

On core-sw (Juniper EX), in configuration mode:

set vlans LTE vlan-id 99

set interfaces ge-0/0/6 description "Teltonika RUT240"
set interfaces ge-0/0/6 unit 0 family ethernet-switching interface-mode access
set interfaces ge-0/0/6 unit 0 family ethernet-switching vlan members LTE

set interfaces ge-0/0/0 description "TRUNK-VyOS-Router"
set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members LTE

ge-0/0/0 is the existing trunk to vyos-fw eth0; only the vlan members LTE line is new. ge-0/0/6 is the dedicated access port for the RUT240 LAN1 cable.

commit and verify with show vlans LTE — port ge-0/0/6.0 should be listed untagged, ge-0/0/0.0 tagged.


VyOS Configuration

All commands run in configure. Commit after each block, or batch and commit once at the end.

1. LTE VLAN sub-interface

set interfaces ethernet eth0 vif 99 description 'WAN-LTE (RUT240)'
set interfaces ethernet eth0 vif 99 address dhcp
set interfaces ethernet eth0 vif 99 dhcp-options no-default-route

no-default-route prevents the kernel from installing a second default via LTE; the watchdog owns that decision. Observed lease: 192.168.99.111 from the RUT240 DHCP pool.

2. Source NAT — masquerade on LTE

set nat source rule 110 outbound-interface name eth0.99
set nat source rule 110 source address '10.69.0.0/16'
set nat source rule 110 translation address masquerade

Existing rule 100 (pppoe0) stays untouched.

3. Firewall — add eth0.99 to WAN zone

set firewall zone WAN interface eth0.99

No new policies. WAN-LOCAL, WAN-DMZ, and outbound ALLOW-INTERNET apply identically to either uplink. The router itself remains protected: only established/related inbound from WAN.

4. DNS / NTP

No changes required.

  • Router system DNS (1.1.1.1, 9.9.9.9) reaches over either uplink.
  • Technitium (10.69.20.53) forwarding to Quad9 9.9.9.9:853 / 149.112.112.112:853 (DoT) works on either path.
  • NTP upstreams (time.cloudflare.com, time*.vyos.net) likewise.

Known broken paths

Two native VyOS mechanisms were tried and rejected before settling on the watchdog:

  • load-balancing wan is broken on circinus for PPPoE. Op-mode show load-balancing wan [status] commands are gone. The vyos-wan-load-balance.service daemon does not detect pppoe0 going down — nexthop is required syntactically for pppoe0 even though PPP has no real next-hop. nexthop 0.0.0.0 parses but the daemon never logs state changes during a real DSL outage; the fwmark/nftables rules stay pointed at the wlb_mangle_isp_pppoe0 chain and never flip to LTE. The feature is officially deprecated in VyOS.
  • protocols failover route is broken for multi-default scenarios. The vyos-failover daemon's ICMP healthcheck does not bind to the specified interface (no SO_BINDTODEVICE); it follows the main routing table, so a check nominally "via eth0.99" actually egresses whatever default is currently active. Working around it by aiming the check at the directly-connected 192.168.99.1 makes the check permanently succeed, which permanently installs the LTE route. Worse: once the failover daemon installs its kernel route (proto failover), FRR sees a kernel-distance-0 default and refuses to install its distance-1 static default for pppoe0. Result: pppoe0 default disappeared from the kernel and all traffic egressed via LTE — SIM data bleed observed during testing.

Watchdog-based failover

Three files installed on vyos-fw.

/config/scripts/wan-watchdog.sh (chmod +x):

#!/bin/bash
set -u

PRIMARY_IFACE=pppoe0
BACKUP_GW=192.168.99.1
BACKUP_IFACE=eth0.99
BACKUP_METRIC=1
TARGETS=(1.1.1.1 9.9.9.9)
FAIL_THRESHOLD=3
RECOVER_THRESHOLD=3
STATE=/run/wan-watchdog.state
LOG_TAG=wan-watchdog

logger -t "$LOG_TAG" -p user.info "tick"

ok=0
for t in "${TARGETS[@]}"; do
if ping -I "$PRIMARY_IFACE" -c 1 -W 2 "$t" >/dev/null 2>&1; then
ok=1
break
fi
done

fail=0; succ=0
[[ -f $STATE ]] && source "$STATE"

if [[ $ok -eq 0 ]]; then
fail=$((fail+1)); succ=0
else
succ=$((succ+1)); fail=0
fi

lte_active=0
ip route show default | grep -q "dev $BACKUP_IFACE proto static metric $BACKUP_METRIC" && lte_active=1

if [[ $fail -ge $FAIL_THRESHOLD && $lte_active -eq 0 ]]; then
ip route add default via "$BACKUP_GW" dev "$BACKUP_IFACE" proto static metric "$BACKUP_METRIC"
logger -t "$LOG_TAG" -p user.warning "DSL down ($fail fails) — LTE default installed"
elif [[ $succ -ge $RECOVER_THRESHOLD && $lte_active -eq 1 ]]; then
ip route del default via "$BACKUP_GW" dev "$BACKUP_IFACE" proto static metric "$BACKUP_METRIC"
logger -t "$LOG_TAG" -p user.warning "DSL recovered ($succ ok) — LTE default removed"
fi

printf 'fail=%d\nsucc=%d\n' "$fail" "$succ" > "$STATE"

/etc/systemd/system/wan-watchdog.service:

[Unit]
Description=WAN failover watchdog
After=network-online.target

[Service]
Type=oneshot
ExecStart=/config/scripts/wan-watchdog.sh

/etc/systemd/system/wan-watchdog.timer:

[Unit]
Description=Run WAN watchdog every 10s

[Timer]
OnBootSec=1min
OnUnitActiveSec=10s
AccuracySec=1s

[Install]
WantedBy=timers.target

Enable:

sudo systemctl daemon-reload
sudo systemctl enable --now wan-watchdog.timer

Mechanism. PPPoE auto-installs the default route at metric 20. Every 10 s the watchdog pings 1.1.1.1, then 9.9.9.9 (short-circuits on first success), with ping -I pppoe0 to force egress on the primary. After three consecutive failed rounds, it adds a kernel default via 192.168.99.1 dev eth0.99 metric 1, preempting pppoe0. After three successful rounds in the LTE-active state, it removes that route. Detection window ≈ 30 s, recovery window ≈ 30 s.

Durability caveat. /etc/systemd/system/ is not VyOS-managed. The unit files survive reboots on the current image but are not guaranteed to survive image upgrades. See Out of Scope.


IPsec Caveat

The site-to-site tunnel peer-vps is anchored to the DSL public IP (see IPsec docs and home-router IPsec section).

During an LTE-only window:

  • Strongswan loses the bound local address → IKE SAs drop → DPD restart action triggers reconnect attempts that will not succeed (wrong source IP, CGNAT, VPS has no DDNS update).
  • Outbound LAN connectivity still works (handled entirely by the watchdog-managed kernel default, independent of IPsec).
  • All VPN-dependent inbound paths break: Matrix federation, Mumble, Minecraft, OpenClaw gateway, the VPS Traefik split-horizon overlay — none return until DSL is restored.

This is accepted breakage for the first iteration. See Out of Scope for the path to make IPsec survive failover.


Health Check & Monitoring

CommandPurpose
journalctl -t wan-watchdog -fLive watchdog log (one tick per 10 s, plus state-change warnings)
systemctl list-timers wan-watchdog.timerConfirm timer is armed and next-fire time
ip route show | grep defaultAll defaults in the main table; LTE entry present only during failover
ip route get 1.1.1.1Which interface the kernel currently picks for outbound
show interfaces ethernet eth0 vif 99 briefLTE uplink state + DHCP lease
monitor traffic interface eth0.99Live packet capture on LTE path

Optional Uptime Kuma probes:

  • TCP/ICMP to 192.168.99.1 (RUT240 LAN gateway) — proves VyOS↔RUT240 path.
  • HTTP probe to a known external endpoint via curl on vyos-fw bound to eth0.99 — confirms LTE egress works even when DSL is up.

Verification

End-to-end test, performed on deployment day:

  1. Baseline (DSL active):
    • From a TRUSTED LAN host: curl -s ifconfig.me → DSL public IP.
    • On vyos-fw: ip route show | grep default → single default dev pppoe0 at metric 20.
  2. Simulate DSL outage:
    configure
    set interfaces ethernet eth1 disable
    commit
  3. Wait ~30 s (three 10 s ticks of consecutive failure).
  4. Confirm failover:
    • ip route show | grep default → new entry default via 192.168.99.1 dev eth0.99 proto static metric 1.
    • ip route get 1.1.1.1dev eth0.99.
    • From the LAN host: curl -s ifconfig.me → CGNAT-range LTE public IP (different from step 1).
    • journalctl -t wan-watchdog -n 20 shows the DSL down ... LTE default installed warning.
  5. Revert DSL:
    delete interfaces ethernet eth1 disable
    commit
  6. After ~30 s, watchdog logs DSL recovered ... LTE default removed; ip route get 1.1.1.1 returns to dev pppoe0.
  7. Baselines for both paths: ping -c 50 1.1.1.1 and a quick speedtest-cli from a LAN host. Note LTE RTT and throughput — these set expectations for how degraded user experience will be during failover.

If failover flaps → raise FAIL_THRESHOLD / RECOVER_THRESHOLD in the script, or add a third ping target.


Known Limitations

  • Gray-failure blind spot. Detection covers interface-down and full ICMP loss only. A DSL line that is up and routes ICMP to 1.1.1.1/9.9.9.9 but is broken for DNS, TCP, or specific destinations will not trigger failover. Add an L7 healthcheck (HTTP probe bound to pppoe0) if this becomes a real failure mode.
  • Long-lived TCP sessions reset at the moment of failover; there is no conntrack stickiness — every flow re-NATs via the new egress.
  • IPsec down during LTE-only operation (see above).
  • SIM data cap. A long DSL outage can burn the monthly quota fast. Set RUT240's data-warning threshold and pick a SIM plan with a sane cap.
  • LTE latency (~40–80 ms vs ~10 ms DSL) — VoIP / interactive SSH degrade noticeably but stay usable.
  • No IPv6 over LTE. Current fd00:69:30::/64, fd00:69:40::/64 and any global v6 traffic break during failover. Most carriers offer v6 but RUT240 + CGNAT v6 is fiddly; deferred.

Out of Scope / Future Work

  • Move watchdog into a VyOS-managed location so it survives image upgrades, e.g. a /config-resident systemd unit installed via a post-boot hook, or a proper VyOS config-mode wrapper. Current /etc/systemd/system/ location is durable across reboots but not across image-rebuild upgrades.
  • Replace watchdog with native VyOS once a future release fixes protocols failover route to bind the ICMP healthcheck to the specified interface (SO_BINDTODEVICE). At that point the script becomes redundant.
  • IPsec survives failover. Change tunnel to local-address any + add DDNS records for both DSL and LTE public IPs (LTE behind CGNAT — needs IKE NAT-T from home, which works because home is the initiator). Update peer-vps on vyos-edge to accept either source.
  • VPS-side detection. Have vyos-edge notice when the tunnel is down for >N minutes and switch DNS for home.helix9.org / inbound published services to an "offline" sentinel.
  • SMS alerts via RUT240 on link-up / link-down events.
  • IPv6 over LTE once carrier and RUT240 firmware cooperate.
  • Granular policy routing (e.g. force backups to always egress DSL, never LTE, to protect the SIM cap).