Shipping cluster failure-domain feature to LXD
A new lxc cluster failure-domain subcommand I contributed to LXD, replacing the "edit the member YAML by hand" workaround for managing per-member failure domains.
LXD 6.7 ships today, and one of the smaller things in the release notes is a new lxc cluster failure-domain subcommand. I worked on this one as my first contribution to canonical/lxd. It closes canonical/lxd#10842.
What is a failure domain ?
A failure domain in LXD is a label on a cluster member that tells the scheduler which physical fault zone that member sits in — same rack, same host, same network switch etc. When LXD has to place an instance and a member with a matching failure domain is unavailable, it prefers another member from the same domain over jumping to a different one. It’s the standard “don’t put both copies of the data on the same rack” concept, applied to scheduling. The piece that was missing was a CLI for it.
What was wrong before
Before this PR, the only way to read or set a member’s failure domain was:
lxc cluster edit <member>
That opens the full member YAML in your $EDITOR, including dozens of fields you don’t want to touch. You’d find the failure_domain: line, change it, save, and trust yourself not to have nudged anything else on the way out. There was no get, no set, no unset — and no way to script it without writing your own YAML rewrite.
The change
canonical/lxd#17334 (merged for LXD 6.7) adds three verbs:
lxc cluster failure-domain get <member>
lxc cluster failure-domain set <member> <name>
lxc cluster failure-domain unset <member>
get prints the current failure domain (empty if none is set). set writes the new one. unset clears it. None of them touch anything else on the member object, so they’re safe to call from a cluster bring-up script:
for m in node-{a,b,c}; do
lxc cluster failure-domain set "$m" "rack-1"
done
Behind the scenes the verbs use the existing PUT /1.0/cluster/members/<member> API — there’s no new server endpoint. The change is on the lxc client side: a new subcommand under lxc cluster, plus matching docs and integration tests so the three verbs stay covered.
Important Links
- The PR:
canonical/lxd#17334(merged for LXD 6.7), closes#10842. - Today’s release: LXD 6.7 release notes · discourse announcement · GitHub release.
- Docs: How to configure cluster failure domains.
- Repo:
canonical/lxd.