Logs — VictoriaLogs + Vector
Web UI: https://logs.home.helix9.org/select/vmui/ (LAN + Authentik forward-auth)
Host: logs / logs.home.helix9.org
IP: 10.69.20.79
VLAN: 20 (SERVERS)
VMID: 279
Overview
Central log aggregation for every managed LXC. A single Rocky Linux 10 LXC runs
VictoriaLogs (single Go binary). Every other LXC runs Vector as a
systemd service that tails journald and pushes to VictoriaLogs over the
Elasticsearch bulk API.
[LXC: app-a]──┐
[LXC: app-b]──┼─► Vector (journald) ──HTTP──► logs:9428 (VictoriaLogs) ──► VMUI / Grafana
[LXC: app-c]──┘ │
└── /var/lib/victorialogs (30d retention)
VictoriaLogs is also scraped by Prometheus for self-metrics and provisioned as
a second datasource in Grafana via the victoriametrics-logs-datasource
plugin.
Why VictoriaLogs
- Single Go binary, ~35 MB RAM at our scale (no JVM, no Elasticsearch).
- Per-token + columnar index → sub-second full-text queries on millions of rows.
- ~10-20x lighter than Loki for our workload; 30d of logs fits in megabytes thanks to columnar compression.
- Native Prometheus
/metrics; pairs cleanly with the existing stack.
Vector is the shipper because: small static binary, journald source built-in, runs on every LXC regardless of distro, and supports the VictoriaLogs ES API.
Infrastructure
LXC Container
| Setting | Value |
|---|---|
| Node | pve02 |
| VMID | 279 |
| IP | 10.69.20.79/24 |
| Gateway | 10.69.20.1 |
| CPU | 1 core |
| RAM | 1024 MB |
| Swap | 512 MB |
| Disk | 8 GB |
| Template | rockylinux-10-default |
| Unprivileged | yes |
Ansible
- Playbook:
playbooks/logs.yml - Roles:
roles/victorialogs/— server (binary + systemd unit)roles/log_shipper/— Vector on every LXC
- Host vars:
inventory/host_vars/logs/vars.yml - Runtime: native binaries via systemd (no containers)
- Data dir:
/var/lib/victorialogs
Published Ports
| Port | Service | Notes |
|---|---|---|
| 9428 | VictoriaLogs HTTP | Ingest API + VMUI + /metrics |
VictoriaLogs is also Vector's sink target on every LXC.
Ingest Path
Vector config (/etc/vector/vector.yaml, rendered from
roles/log_shipper/templates/vector.yaml.j2):
sources:
journal:
type: journald
current_boot_only: true
transforms:
enrich:
type: remap
inputs: [journal]
source: |
.host = "{{ inventory_hostname }}"
unit = to_string!(._SYSTEMD_UNIT || "unknown")
if match(unit, r'^[0-9a-f]{16,}\.(service|scope|mount)$') {
unit = "transient"
}
.service = unit
.vlan_group = "<group>" # servers / mgmt / dmz
sinks:
victorialogs:
type: elasticsearch
endpoints: ["http://10.69.20.79:9428/insert/elasticsearch/"]
api_version: v8
mode: bulk
healthcheck:
enabled: false # VL does not implement /_cluster/health
bulk:
index: vector-logs
query:
_msg_field: "message"
_time_field: "timestamp"
_stream_fields: "host,service,vlan_group"
Stream fields = how VictoriaLogs partitions data; chosen so each (host, service, vlan_group) triple is one stream — fast filtering, low cardinality.
The transient collapse hides systemd's hex-named scope units (e.g. session
scopes) which would otherwise bloat the service field.
Query Language — LogsQL
VMUI accepts LogsQL. Quick examples:
| Query | Meaning |
|---|---|
* | everything |
_stream:{host="podman"} | one host |
_stream:{service="sshd.service"} | sshd across the fleet |
_stream:{vlan_group="dmz"} | every DMZ host |
error OR fail | full-text across all streams |
_stream:{host="metrics"} error _time:1h | combine: stream + text + time |
_time:5m | last 5 minutes |
_stream:{host="logs"} | stats by (service) count() | grouped counts |
_stream:{host=~"pve.*"} | regex on stream label |
_msg:"connection refused" | exact substring in body |
_time: accepts 5m, 1h, 2026-05-12T00:00:00Z, or [t1, t2] ranges.
Web UI
https://logs.home.helix9.org/select/vmui/
- Query tab — LogsQL with stream-field auto-complete on the left.
- Overview tab — stream and field facets with row counts.
- Live tail — top-right toggle, websocket stream of new events matching the current query.
Auth: local-or-auth middleware in Traefik (LAN bypass + Authentik OIDC for
external access).
Grafana Integration
A second datasource — VictoriaLogs — is provisioned in the same metrics
LXC's Grafana, alongside Prometheus.
- Plugin:
victoriametrics-logs-datasource(auto-installed viaGF_INSTALL_PLUGINSenv). - Datasource URL:
http://10.69.20.79:9428 - Provisioning file:
roles/metrics/templates/grafana-datasource-logs.yml.j2
Grafana → Explore → datasource picker → VictoriaLogs. LogsQL syntax, same as VMUI.
Self-Monitoring
Prometheus scrapes 10.69.20.79:9428/metrics. Notable series:
| Series | Use |
|---|---|
vl_rows_ingested_total | ingest rate |
vl_storage_data_size_bytes | on-disk size |
vl_streams_created_total | new streams (label cardinality) |
vl_http_request_errors_total | ingest/query errors |
up{job="victorialogs"} | liveness |
Worth adding alert rules later (none yet) — e.g. up == 0 for 5 min, or sudden
drop in vl_rows_ingested_total indicating shippers stopped.
Retention
-retentionPeriod=30d flag in the systemd unit. Older data is dropped
automatically on the next storage merge pass.
To change: edit victorialogs_retention in roles/victorialogs/defaults/main.yml
(or override per-host in inventory/host_vars/logs/vars.yml) and redeploy:
ansible-playbook playbooks/logs.yml --tags victorialogs
Disk usage at ~10 MB/day raw input compresses to well under 5 MB/day on disk. The 8 GB rootfs is overkill — could shrink further if we want.
Operations
Check server
ssh logs systemctl status victorialogs
ssh logs curl -s localhost:9428/health # → OK
ssh logs du -sh /var/lib/victorialogs
Check a shipper
ssh <host> systemctl status vector
ssh <host> journalctl -u vector -n 50
If Vector logs say Healthcheck disabled. and Starting journalctl. — fine,
shipping. ES sink errors mean either VictoriaLogs unreachable or the
/_cluster/health healthcheck got re-enabled (it must stay off).
Restart shipper on one host
ssh <host> systemctl restart vector
Restart server
ssh logs systemctl restart victorialogs
Safe — Vector buffers locally during the restart and resumes.
Roll out a Vector config change
ansible-playbook playbooks/logs.yml
Renders new vector.yaml everywhere, restarts Vector on each host via handler.
Add a new field to the stream
Edit roles/log_shipper/templates/vector.yaml.j2 — extend the enrich
transform, then add the new field name to _stream_fields in the sink's
query: block. Re-run the playbook. Note: new fields only appear on new
data; old entries keep old shape.
Wipe storage
ssh logs systemctl stop victorialogs
ssh logs rm -rf /var/lib/victorialogs/*
ssh logs systemctl start victorialogs
Bump VictoriaLogs version
Edit victorialogs_version in inventory/host_vars/logs/vars.yml (overrides
the role default). Re-run:
ansible-playbook playbooks/logs.yml --limit logs
Role's version check re-downloads only when needed; handler restarts. Confirm
release URL works at
https://github.com/VictoriaMetrics/VictoriaLogs/releases — artifact pattern
victoria-logs-linux-amd64-vX.Y.Z.tar.gz.
Bump Vector version
Edit vector_version in roles/log_shipper/defaults/main.yml. Re-run
playbooks/logs.yml. The role downloads from
packages.timber.io/vector/<ver>/... (musl static build).
Troubleshooting
VMUI returns 403
Hitting local-or-auth chain: local-only denied your IP. Confirm your
source IP is inside traefik_local_networks (10.69.0.0/16 or
192.168.178.0/24). If denied, you'll get a flat 403 (no redirect to
Authentik) because chained middlewares stop on the first failure.
Outside LAN → you'll redirect through Authentik instead.
If DNS for logs.home.helix9.org doesn't yet resolve, the request never
reaches Traefik's logs router and you may hit a wildcard fallback. Run
playbooks/vyos_dns.yml to register the subdomain.
Vector logs Healthcheck failed. Unexpected status: 400 Bad Request
VictoriaLogs doesn't implement /_cluster/health, which Vector's ES sink
probes by default. Confirm healthcheck: { enabled: false } is set in
vector.yaml. Re-render via the playbook if missing.
Vector log spam unsupported path requested: /_cluster/health (server side)
Same root cause as above, observed from VictoriaLogs' point of view. Fix in the shipper, not the server.
Logs from one host missing
ssh <host> systemctl status vector
ssh <host> journalctl -u vector -n 100
Common causes:
- Vector not installed (host not in
lxcgroup?enable_logging: false?). - Sink unreachable (firewall blocking
<host> → 10.69.20.79:9428). - Vector user not in
systemd-journalgroup → no journald read access.
Hex-soup unit names appearing again
Means new transient unit pattern not matched by the regex in the enrich
transform. Extend the pattern in vector.yaml.j2:
if match(unit, r'^[0-9a-f]{16,}\.(service|scope|mount)$') { ... }
…adding the new suffix (e.g. slice, timer) if needed.
Stream cardinality explodes
If vl_streams_created_total rate ramps up, something is putting a
high-cardinality value into a stream field (e.g. a PID or session ID into
service). Bring it back to a small enumerable set. Each unique
(host, service, vlan_group) combo is one stream.
Ansible SSH fails — "agent refused operation" or "Connection closed"
ssh-agent has a hardware key (FIDO/SK) loaded; sshd tries it first, agent
refuses (touch required), MaxAuthTries exhausted before id_ed25519_ansible
is offered. Fix is baked into ansible.cfg:
ssh_args = ... -o IdentitiesOnly=yes -o IdentityAgent=none
If you still see it, confirm those flags are present.
Credentials
None — VictoriaLogs has no auth of its own. Access control is purely network
(LAN bypass) + Authentik forward-auth in Traefik for external traffic. The
ingest endpoint on 10.69.20.79:9428 is open to anything inside the VLAN
mesh, which is fine for now; VyOS limits cross-zone traffic.
If you ever need auth: drop vmauth in front, or use Traefik's basic-auth
middleware on a separate ingest hostname.
Known Issues / Caveats
Old transient unit names already stored
The hex-name collapse only applies to new ingest. Old events keep their
original service value until 30-day retention drops them. The
Stream fields panel in VMUI will keep showing those values until they age
out.
No alerting on logs yet
vmalert isn't wired up. Could be added later — Alertmanager is already
running in the metrics LXC, so adding vmalert pointing at VictoriaLogs and
forwarding to Alertmanager is a small lift.
Single-node, no HA
OSS VictoriaLogs is single-binary; no replication, no clustering. At our
volume and SLA that's fine. Back up /var/lib/victorialogs via Proxmox
Backup if log retention is precious.
Vector buffers in memory
data_dir: /var/lib/vector is set, but the ES sink uses an in-memory queue by
default. A long VictoriaLogs outage will drop events on Vector restart. For
larger setups, switch the sink to disk buffer (buffer.type: disk).
Healthcheck must stay off
Vector's ES sink calls GET /_cluster/health which VictoriaLogs answers 400.
Vector treats that as a hard failure at startup → Vector won't start. We
disable the healthcheck explicitly. Don't remove that line.