Continued adventures with LXD: Grafana, InfluxDB, and ZFS storage
I’ve got this idea that I want to try out: A Dockerless homeprod.
The emphasis here is on prod, because I will still be using Docker for testing out stuff and learning, but for my home “production” I am migrating from Docker to other solutions like LXD, for reasons I described for example in Deploying Nextcloud locally with LXD.
And speaking of that Canonical-owned container and VM orchestrator, this post is another entry in my LXD road.
Moving my monitoring stack to LXD
The service that I have been running the longest in my homelab journey has been my monitoring stack. Currently it consists of:
- Prometheus
- Node Exporter
- Smartctl Exporter
- Telegraf
- InfluxDB
- Grafana
There have been also other services used in the past like speedtest-exporter
, but those were dropped because I was not interested in them enough.
The stack has two paths the data is flowing. The first one is Node & Smartctl Exporters -> Prometheus -> Grafana, and the second one is Telegraf -> InfluxDB -> Grafana. I have it configured this way because (at least to my knowledge) Telegraf can fetch some metrics that Node Exporter cannot, like GPU or ZFS statistics. Eventually all metrics go into Grafana and are presented there.
Until now this stack has been running as this mix of technologies: Node Exporter running on bare metal as a systemd
service, and everything else in Docker, combined in a single Docker Compose file.
I decided to split it between data collection and data presentation layers. Prometheus, Exporters and Telegraf go to bare metal and InfluxDB and Grafana go to an LXD container.
The first is to be done, the second part is already in production.
Installing Grafana and InfluxDB in a container turned out to be very simple, both have very good documentation (Grafana) (InfluxDB), and installing them in an Ubuntu 24.04 environment meant adding a repo, installing them with apt
and starting them as systemd
services.
What was a positive surprise for me, was that in order for Grafana in a container to listen to Prometheus on the host, I just had to set the Prometheus URL as <<host_ip>>:9090
in Grafana data sources. I suspected it might be more complicated, but I guess that is result of the default LXD network being a bridge type. Networking in LXD is still in many ways a mystery for me.
And similarly, to send logs from Telegraf to InfluxDB, all I had to do was provide the container IP in telegraf.conf
:
[[outputs.influxdb_v2]]
urls = ["http://10.54.60.34:8086"]
Adding a LXD Dashboard to Grafana
All that digging in Grafana made me think that it would be nice to have a dashboard showing information about LXD itself. And that was very simple because LXD has a Prometheus integration available straight out of the box.
LXD has excellent documentation on how to set up the whole pipeline to send metrics and logs to Grafana:
Loki is something I have not touched before, maybe I’ll investigate it more to see if I can apply it as well to my other services.
And le voila por favor:
What is peculiar is that LXD believes I have 180GB of RAM. While I would love that to be the truth, it sadly isn’t. I’m guessing LXD multiplies the amount of RAM I have (~32GB) by the number of running containers (six).
Grafana showing I have 180GB of RAM
Moving my LXD storage pool
The final change that I did was I moved my LXD storage pool from my boot drive to one of my ZFS pools.
One reason was that the boot drive was running out of space, and also I wanted to have my LXD storage protected by ZFS features like copy-on-write and RAID1 (I know, I know, RAID is not backup, but it is better to have two disks than one).
A ZFS storage pool can be created in the LXD Web UI. I created a pool in my large-data
pool, and vms
dataset. The process is described in LXD docs: How to manage storage pools
My ZFS storage pool in LXD Web UI
The effect of creating an LXD pool in ZFS can be seen by running
zfs list -t filesystem
List of datasets in ZFS created by LXD
LXD also allows migrating the storage of existing containers from one storage pool to another with just a few clicks, but the containers need to be stopped.
Storage Pool migration dialog in LXD Web UI
Overall I am amazed how powerful LXD and how many built-in tools it provides to handle both day-to-day and one-time tasks.
The incessant crunching
There is however one thing I did not foresee.
Both my ZFS pools are two disk mirrors of HDDs. And usually they have seen very little activity apart from the occasional copying of data to and from. But now, when I moved my containers to one of the pools, the disk see constant activity, writing logs and doing other continuous tasks. And those HDDs are of the louder variety, being semi-professional NAS grade (one is a WD Red Pro and the second one is Seagate Ironwolf). Which means there is now and endless stream of crunching coming from under my desk, and I do not like that.
I already have a solution planned for this, and hopefully I will write about it in my next blog post.
The bottom line
This has been another episode of my homelab journey. After a few relatively quiet months I’m back on on track of tinkering and experimenting, and I have a lot of plans for my servers. I strongly believe that a homelab is most often in a temporary state, and there is always something to change. Then again, as most of my homelab is homeprod, I need to be more careful, better plan the changes, focus more on backups and recovery, and basically behave more like a sysadmin having responsibilities. Move slowly and fix stuff :)
And that’s basically it, thanks for reading!
If you enjoyed this post, please consider helping me make new projects by supporting me on the following crowdfunding sites: