Slow Download After Draytek 167 Migration — PPPoE Throughput on VyOS in Proxmox
After the Draytek 167 migration, download throughput dropped: ~26 MB/s vs. ~32 MB/s previously on the FritzBox (Telekom Super-Vectoring 250/40, line synced at 292/46 Mbit/s). Single-flow downloads from US servers capped near 22 MB/s. Sync, SNR margin (7.7 dB down), attenuation (3.8 dB), and CRC counters were all clean — the line itself was healthy. The bottleneck was the VyOS-on-Proxmox forwarding path, specifically the host's onboard e1000e NIC.
Symptoms
- DSL line trains correctly.
Actual Rate 292121 kbpsdown /46719 kbpsup, profile 35b, SNR margin 7.7 dB / 10.3 dB, 0 CRC errors. - PPPoE session up; public IP assigned; routing and NAT functional.
wgetfrom VyOS itself to a German Hetzner mirror caps near 180–200 Mbit/s instead of the expected ~250 Mbit/s.- A LAN client behind VyOS sees even less (~26 Mbit/s in initial tests over a degraded path).
- No core inside the VyOS VM hits 100%; CPU usage looks idle.
mpstat -P ALL 1on the Proxmox host shows one core (CPU5 in this box) sustaining ~11–14%%softwhile every other core sits at 0% — softirq is pinned to a single core.
Root Cause
The Proxmox host bridges WAN traffic through its onboard NIC:
ethtool -i nic0
driver: e1000e
e1000e (Intel I218/I219-class onboard NICs) is single-queue — ethtool -l nic0 returns Operation not supported because there is exactly one RX/TX queue and no RSS hash indirection table. Every packet on the WAN bridge is processed by one host CPU's softirq context. PPPoE adds per-packet overhead (encapsulation, no hardware offload on the pppoe0 interface), so a single TCP flow saturates that one core long before it saturates the 1 Gbit/s link.
Compounding factors found on the same box:
- Default ring buffers
RX 256 / TX 256— far below the4096maximum the driver allows. - No multiqueue on the virtio NICs attached to the VyOS VM (
net0/net2had noqueues=parameter). - No RPS (Receive Packet Steering) configured on the host, so no software fan-out across cores either.
The FritzBox didn't expose this because it is a dedicated appliance with no virtualization layer and a router ASIC handling PPPoE in hardware.
Fix
Apply on the Proxmox host unless noted otherwise.
1. Bump physical NIC ring buffers
ethtool -G nic0 rx 4096 tx 4096
Verify with ethtool -g nic0. Pre-set maximums was 4096/4096 on this hardware.
2. Enable RPS to spread softirq across cores
The Proxmox host has 6 logical CPUs in this deployment. The CPU mask is hex; each bit is one core. 0x3e = 0b00111110 selects cores 1–5 and skips core 0 (which handles other host work).
echo 3e > /sys/class/net/nic0/queues/rx-0/rps_cpus
Pitfall:
echo fe > .../rps_cpusreturnsValue too large for defined data typeon a 6-core box. The mask width must match the CPU count.
3. Enable RFS (flow-aware steering)
echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
echo 4096 > /sys/class/net/nic0/queues/rx-0/rps_flow_cnt
4. Raise softirq budget
sysctl -w net.core.netdev_budget=600
sysctl -w net.core.netdev_budget_usecs=8000
5. Multiqueue on the VyOS virtio NICs
Per WAN-side and LAN-side virtio NIC, set queues=N (match VM core count, typically 4):
qm set 101 -net2 virtio=BC:24:11:D1:B1:14,bridge=vmbr0,queues=4
qm set 101 -net0 virtio=BC:24:11:CD:AF:6B,bridge=vmbr2,queues=4
Reboot the VM. Inside VyOS, verify:
ethtool -l eth1
# Combined: 4
Persisting the Changes
The ethtool, echo, and sysctl commands above are not persistent across host reboots. Persist them:
/etc/sysctl.d/99-net-tuning.conf:
net.core.rps_sock_flow_entries = 32768
net.core.netdev_budget = 600
net.core.netdev_budget_usecs = 8000
/etc/network/interfaces — the WAN-bridge slave (nic0) needs an auto line so the stanza activates at boot, and the post-up hooks must be indented (tab or 4 spaces) under iface. The echo redirect needs bash -c because ifupdown runs hooks via /bin/sh, where stdout redirection from a built-in echo into /sys/... can fail under restricted shells:
auto nic0
iface nic0 inet manual
post-up ethtool -G nic0 rx 4096 tx 4096 || true
post-up bash -c 'echo 3e > /sys/class/net/nic0/queues/rx-0/rps_cpus'
post-up bash -c 'echo 4096 > /sys/class/net/nic0/queues/rx-0/rps_flow_cnt'
Apply without a host reboot:
ifdown nic0 && ifup nic0
sysctl --system
The
ifdown nic0briefly flaps the WAN bridge — PPPoE will reconnect within a few seconds. If that's disruptive, defer activation until the next planned host reboot; the file edits alone are enough to make the change persistent.
Verify after applying:
ethtool -g nic0 | grep -A1 "Current hardware" # RX/TX should show 4096
cat /sys/class/net/nic0/queues/rx-0/rps_cpus # should print: 3e
cat /sys/class/net/nic0/queues/rx-0/rps_flow_cnt
sysctl net.core.netdev_budget # 600
sysctl net.core.netdev_budget_usecs # 8000
sysctl net.core.rps_sock_flow_entries # 32768
The qm set ... queues=4 change is already persistent (lives in /etc/pve/qemu-server/101.conf).
Verification
From the VyOS shell, single-flow test against a topologically close server (avoid trans-Atlantic — long RTT caps single-flow TCP regardless of router):
wget -O /dev/null https://fsn1-speed.hetzner.com/10GB.bin
For an aggregate-throughput test, run four parallel streams:
for i in 1 2 3 4; do wget -q -O /dev/null https://ash-speed.hetzner.com/1GB.bin & done
wait
While the test runs, on the host:
mpstat -P ALL 1
Expectation after the fix: %soft is spread across cores 1–5 instead of pinned to one. Per-flow throughput recovers to within a few percent of the contracted rate.
Diagnostic Cheatsheet
| Question | Command (where) |
|---|---|
| Is the DSL line healthy? | Draytek WUI → Online Status → Physical Connection. Look for sync rate near attainable rate, SNR margin > 6 dB, 0 CRC. |
| Is the PPPoE session up and clean? | show interfaces pppoe pppoe0 (VyOS) — check for errors/dropped. |
| Is the bottleneck the router or the line? | wget from VyOS itself. If VyOS is slow, line/contract/router. If VyOS is fast but LAN clients are slow, LAN/forwarding path. |
| Is softirq saturating one host core? | mpstat -P ALL 1 on the Proxmox host during a sustained transfer. |
| Does the physical NIC support multiqueue? | ethtool -l nic0. Operation not supported ⇒ single queue. |
| Are virtio NICs multiqueue? | ethtool -l eth1 inside VyOS. Combined should equal the queues= value in qm config. |
Notes on Single-Flow TCP and Distance
A speedtest from a German connection to a US server (Hetzner Ashburn, RTT ~100 ms) is bandwidth-delay-product-limited: a single TCP flow with default window sizes tops out near 25 MB/s regardless of the local router. Always benchmark against the closest mirror (fsn1-speed.hetzner.com, nbg1-speed.hetzner.com) or use parallel streams when the goal is to characterise router throughput rather than wide-area TCP behaviour.
Long-Term Recommendation
e1000e is a known weak driver/NIC family for routing workloads. A multiqueue NIC (Intel i350-T2 is a cheap, well-supported choice) eliminates the single-softirq-core ceiling without needing RPS workarounds. Plan a future hardware change if sustained single-flow PPPoE throughput above ~250 Mbit/s is needed.