Decisions#
In 2024 I decided to finally setup a proper Kubernetes environment at home. I deployed k3s on a few VMs in my Proxmox cluster. After searching around for how to handle storage the most common respones I got were:
- Just use NFS with the NFS CSI.
- Use Ceph/Rook.
- Use Longhorn by Rancher/SUSE.
I didn’t want to use NFS as that would mean a single point of failure for the stateful applications. I didn’t want to use Ceph/Rook as I heard that it really requires 10Gbps networking, and enterprise SSDs, I have neither. So I decided to go with Longhorn.
Longhorn Issues#
I deployed longhorn using ArgoCD + Helm and it worked fairly well for a few months. However after a few months I started to notice that pods would start to complain that the filesystem was read only. I would have to scale the deployment down to 0 and then back up to 1 to fix the issue, but it would come back fairly quickly.
I started searching online for the issue and found this In the Longhorn docs. Now I had heard that Longhorn works best with 10Gbps+, but that it would work with 1Gbps. On that page an interesting thing stuck out to me:
The network bandwidth is not sufficient. Normally 1Gbps network will only able to serve 3 volumes if all of those volumes are running a high intensive workload.
Ah, that makes sense why I wasn’t having issues until I added more volumes. I had about 10 volumes at the max, but 9/10 saw almost no reads or writes.
Another interesting thing in that page was that “automatic recovery” wasn’t working for me.
While I was searching for the issue I started looking at alternatives and found OpenEBS (specifically the Mayastor engine). and Piraeus. So I started to look into replacing Longhorn with OpenEBS.
OpenEBS Mayastor#
An interesting thing I found out while looking into OpenEBS is that their “Mayastor” engine is fairly new, and is supposed to be faster than Longhorn (Longhorns v1 engine). I think the key difference is that Mayastor uses nvmf instead of iscsi.
Another benefit of OpenEBS is that they have local storage options. The local storage option would work great for me because I use CloudNative-PG a lot in my cluster.
So I started moving all the CloudNative-PG volumes to OpenEBS’s local storage option. Everything went smoothly.
Then I setup replicated storage with the Mayastor engine. Once thing I did notice that could be an issue for others is that Mayastor requires a disk to be used 100% by the Disk Pool. This could be an issue if you have a single disk. Luckily I’m using all VMs in Proxmox so I can easily just create a 2nd disk and use that for the disk pool.
After I got the replicated storage setup with 3 node pools, once for each worker node I decided to run an fio test. The fio test I ran is from the OpenEBS Mayastor website, it’s a 4k random read/write test.
Here is a snippet from the results, if you want more details check out my wiki page here
benchtest: (groupid=0, jobs=1): err= 0: pid=22: Sat Dec 21 03:54:57 2024
read: IOPS=966, BW=3864KiB/s (3957kB/s)(226MiB/60006msec)
slat (usec): min=3, max=728958, avg=364.19, stdev=5211.50
clat (usec): min=9, max=2678.8k, avg=7410.38, stdev=43892.33
lat (usec): min=281, max=2678.8k, avg=7774.57, stdev=44403.61
I can the same fio test with the Longhorn storageClass and got the following results:
benchtest: (groupid=0, jobs=1): err= 0: pid=22: Sat Dec 21 04:05:29 2024
read: IOPS=21, BW=85.1KiB/s (87.2kB/s)(5112KiB/60050msec)
slat (usec): min=4, max=800017, avg=1135.57, stdev=23789.63
clat (usec): min=136, max=5372.3k, avg=264699.64, stdev=536953.31
lat (msec): min=4, max=5372, avg=265.84, stdev=537.19
And here is a table showing the data side by side:
Metric | OpenEBS | Longhorn | Winner |
---|---|---|---|
CPU | 6 CPUs | 300m | Longhorn |
Memory | 736 MiB | 1.32GiB | OpenEBS |
Read IOPS | 966 | 21 | OpenEBS |
Read BW (KiB/s) | 3864 | 85.1 | OpenEBS |
Write IOPS | 964 | 22 | OpenEBS |
Write BW (KiB/s) | 3857 | 91.3 | OpenEBS |
Avg Read Latency (msec) | 7.77457 | 265.84 | OpenEBS |
Avg Write Latency (msec) | 8.76656 | 452.51 | OpenEBS |
Pretty incredible difference. I’m glad I switched to OpenEBS, I’ll make sure to make an update post in the future if I see any issues with OpenEBS.
OpenEBS vs Longhorn Continued#
Another thing I wanted to mention when comparing OpenEBS vs Longhorn is the resources required for each.
As I mentioned above OpenEBS Mayastor requires a full drive to be used for the disk pool. My Proxmox nodes have 2 drives, but they are in a raid1 array for redundancy. If I ever decide to move off of Proxmox and just have Kubernetes on bare metal, I will have to have 1 drive for the OS and 1 drive for Mayastor.
The other thing regarding resources is that OpenEBS Mayastor uses a lot more CPU than Longhorn, but it uses a bit less memory. Mayastor uses 2 CPUs per IO engine pod, initially I had 3 IO engine pods, since I have 3 k8s worker nodes. I scaled that down to 2 IO engine pods because I only plan to have a Mayastor replication factor of 2. So, In total Mayastor is using about 4 CPUs. Longhorn on the other hand was only using about 300m ( 0.3 CPUs ) total. Keep in mind that with Mayastor if I wanted a replication factor of 3, I’d have to add 2 more CPUs, bring the total to 6 CPUs, vs 0.3 CPUs for Longhorn.
On the other hand, OpenEBS is using less than 1 GiB of memory, where Longhorn was using 1.32 GiB. Lucky for me that I have much more CPU than I need, but I’m more limited on memory.
OpenEBS also does not have any type of UI, where Longhorn has a nice UI that shows you the status of your volumes, nodes, etc. For me I prefer a CLI, which OpenEBS has, but I know some people prefer a UI.
Conclusion#
I think for a homelab environment OpenEBS Mayastor is the clear winner. It’s faster, and uses less memory. Maybe in the future Longhorn will catch up with the v2 engine.
A few additional things that I saw in the OpenEBS documentation that concerns me but I haven’t fully looked into yet:
- I’d like to scale my worker nodes down to 2, but I’m not sure if I can do that, even though I’m using a replication factor of 2. more info in the docs
- I remember seeing in the docs that a OpenEBS disk pool cannot be expanded, but I can’t find it in the docs now. If I run into that in the future I think I can just create a new disk pool and drain the old one? I’ll report back if I run into that issue.