cross-posted from: https://eviltoast.org/post/14192399
I have love/hate relationship with Proxmox Backup Server (PBS). It’s a groundbreaking product, cutting edge technology, entirely free, and a complete pain in the ass. Because of all of the above, I’ve spent a good deal of time haunting the user forum, irritating the staff, and spreading unpleasant truths about PBS.
So, what’s amazing about Proxmox Backup Server? It’s the realized dream of complete dedupe. There are a couple similar products out there, but largely whatever you think you know about backups probably only barely applies to PBS.
PBS should be deployed on SSD, because its really a huge database of file chunks. (Literally, they are called .chunks.) PBS scans its .chunks to see if it already has a .chunk, and if it does, it is not copied. Absolute dedupe, every time. A backup job consists of a whole lot of reading and calculating, but as little writing as possible. The actual backup itself, such as it exists, is a set of metadata that says which .chunks will be needed to build the VM files. When a backup is deleted, the metadata is deleted. If the .chunks that the metadata pointed at are not used by any other backup and continue to go untouched for a couple days, they are purged by garbage collection.
Think about how this will work out. When you delete a backup you are just deleting your ability to assemble those .chunks into a disk file, essentially nothing happens to the data on the back end for 2 days. If you immediately ran another backup, it’s_going_to_run_like_an_incremental. Because all the data is there still. So the only thing it will write is a new metadata set (aka “backup”) and any delta that’s occurred on the vm since last backup. This is completely different from any sort of backup data management you’ve done before.
When a backup runs in Proxmox and the backup target is a PBS server, the VM is NOT stunned for a snapshot. The ‘stun and snapshot’ tactic is present in almost every virtual backup system out there in order to get the VM running on a delta disk, leaving the main disk available to be scanned for backup. PBS simply does not do this.
What PBS does do instead of a snapshot is hard to describe, but in pursuing their goal of not having a VM stun, they introduced a far worse flaw. The VM can hang on write if the sector to be written is also being backed up at that moment, lockstep with the storage write until that chunk of data is done backing up. Because of this issue, they introduced another feature that is essentially write caching, but they call it fleecing. (I’m told the name fleecing is part of the qemu standard.) Fleecing brought its own game-killer bugs to the table, only partially fixed in the latest version.
PBS does some really cool site to site sync tricks. You can establish a Remote relationship, and then a Sync Job. It’s a nerdy interface, and not at all user friendly, but you can tell it exactly what to sync. They recently added Push style jobs, which will feel more similar to common backup systems than the original Pull jobs. When you start doing site to site sync and your VM backup populations get mixed, you immediately discover the need for Namespaces to segregate them, and that can be an intricate rabbit hole.
I just noticed their site blurb says PBS does physical hosts, which isn’t exactly a lie, but pretty close. PBS is for backing up Proxmox guests.
If you run virtual machines professionally, you should know about Proxmox and have at least tried it. If you use Proxmox with any regularity, you should check out PBS. Here’s a shot of my homelab PBS. Note the dedupe factor.
___
Not sure where you’re getting the idea that it isn’t snapshotting the VM
INFO: include disk 'ide0' 'local-zfs:vm-102-disk-1' 32G INFO: include disk 'efidisk0' 'local-zfs:vm-102-disk-0' 528K INFO: backup mode: snapshot INFO: ionice priority: 7 INFO: creating Proxmox Backup Server archive 'vm/102/2025-05-14T05:02:22Z' INFO: issuing guest-agent 'fs-freeze' command INFO: issuing guest-agent 'fs-thaw' command INFO: started backup task '220d8288-1927-4f79-a354-bc0c288e8223' INFO: resuming VM again INFO: efidisk0: dirty-bitmap status: OK (drive clean) INFO: ide0: dirty-bitmap status: existing bitmap was invalid and has been cleared INFO: using fast incremental mode (dirty-bitmap), 32.0 GiB dirty of 32.0 GiB total INFO: 2% (736.0 MiB of 32.0 GiB) in 3s, read: 245.3 MiB/s, write: 21.3 MiB/s INFO: 3% (1.0 GiB of 32.0 GiB) in 7s, read: 82.0 MiB/s, write: 10.0 MiB/s
I’ve restored complete docker stacks from backup and they are point in time consistent.
Also, you can totally back up hosts using PBS, there’s a client you install that lets it back up any linux host. It won’t be point in time because the client isn’t able to run snapshots like you can with a VM. Works fine, I’ve restored my desktop with it and it’s been in use for months since.
I believe the point is that it doesn’t do a snapshot stun or run out of a delta disk. Its a different technology. I know it does use qm snapshot. The tech docs I’ve read don’t quite fill in the details for me either. Much of the point of PBS architecture was to avoid that stun.
The proxmox_backup_client is not my bag. I’m glad it’s found a user. Most pros I know don’t touch it.
I used the client to try it out, it works (which actually surprised me) but I don’t really need it for the nuke and pave restores I usually do. I just restore dotfiles and hook the shares back up.
Not a pro currently, but I was a sysadmin for 20 years, and a VCP for about 10 of those. I’ve been as impressed with how well PBS works as anything I ever used in those days, even Veeam. I couldn’t say if it scales to the size of guest pools I worked with then, probably not.
Let me explain the ‘pro’ comment. I’m active on the official forum. Most of the ‘pros’ there don’t use proxmox_backup_client. People I respect do not use that tool, probably each for their own reasons.
My own issue with PBS support for physical servers is that full recovery is DIY. You have to format disks and stand up at least a temp OS in order to wipe and start over.
I think I’ve scaled out PBS pretty far. My biggest datastores are 14 TB. That’s about 85 VMs. I use a multi-layer PBS setup for Backup and Sync at 5 geographic locations. A couple sites have hardware syncing to virtual, which in turn does site-to-site. I’m using PBS site-to-site sync to support a colo migration.
PBS is a great tool in rapid development. During the time I’ve used it, they introduced fleecing and all its many issues. There are more changes coming. As admins, we are supporting a moving target.