FD
CRUSH Failure-Domain Helper Placement & Topology Check
ceph osd crush rule create-replicated ...
Ceph Docs →

CRUSH Failure-Domain & Placement Helper

Host or rack? Enter your topology and protection scheme to check whether CRUSH can actually place your pool's data safely, plus the right min_size and the CLI to create the rule.

Host vs Rack
min_size
firstn / indep
Topology Check
Free · No Login
Topology & Protection
Protection
Replicated size=2
Replicated size=3
Replicated size=4
Erasure Coded

Desired Failure Domain
osd
host
chassis
rack
row
datacenter
Hosts
Racks
OSDs / Host
Device Class
CRUSH Rules
Replicated needs:≥ size distinct domains
EC needs:≥ k+m distinct domains
Recommend:+1 spare domain for recovery
min_size (rep):size − 1
min_size (EC):k + 1
Below min_size:I/O halts (by design)
CRUSH mode:firstn (rep), indep (EC)
Documentation
Topology Check Results
configure your topology on the left
and click CHECK TOPOLOGY
to verify it safely supports your protection scheme

The CRUSH Failure-Domain Hierarchy

Ceph's CRUSH map is a tree: osd → host → chassis → rack → row → room → datacenter (root at the top). When you set a pool's failure domain to "host," CRUSH guarantees no two copies (or EC chunks) of the same PG land on the same host — but says nothing about whether they land in the same rack. Choosing a higher level in the hierarchy protects against a bigger blast radius (a whole rack losing power) at the cost of requiring more distinct domains at that level to satisfy your protection scheme.

Most single-rack or small clusters use host as the failure domain, since that's the most granular level above the OSD itself and matches the most common real failure mode (a server dying). Multi-rack deployments with redundant power/networking per rack can justify rack as the failure domain — but only if there are enough racks to satisfy the protection scheme.

Minimum domains by protection scheme

SchemeMinimum DomainsRecommendedmin_size
Replicated size=2231
Replicated size=3342
Replicated size=4453
EC 4+2675
EC 8+311129

"Minimum" is the bare floor CRUSH needs to place data at all. "Recommended" adds one spare domain so the cluster can recover after losing a single domain without going degraded indefinitely — see the Usable Capacity calculator for how this same +1 reserve logic affects usable space.


min_size — Why I/O Halts Instead of Risking Data

min_size is the minimum number of copies (replicated) or chunks (EC) that must be available for a PG to serve I/O at all. For replication it's size − 1; for erasure coding it's k + 1. If the available copies/chunks drop below min_size — say, two simultaneous host failures on a size=3 pool with min_size=2 — Ceph stops serving I/O on the affected PGs entirely rather than risk writes that can't be reliably protected. This looks alarming (the cluster appears to "freeze" for affected pools) but it's the safer failure mode than silently continuing without redundancy.


Frequently Asked Questions

Why not just always use rack as the failure domain for safety?

Because it requires more physical domains to satisfy the same protection scheme. A single-rack cluster literally cannot use rack as a meaningful failure domain — there's only one rack, so CRUSH has nowhere else to place additional copies/chunks. Match the failure domain level to how many of that unit you actually have, not aspirationally to the safest-sounding option.

What's the difference between firstn and indep CRUSH algorithms?

firstn is used for replicated pools — if a chosen OSD becomes unavailable, CRUSH tries the next one in a deterministic sequence, which works fine because all replicas are interchangeable. indep is used for erasure-coded pools, where each chunk position (1st data chunk, 2nd parity chunk, etc.) is meaningful — indep mode replaces a failed position independently without reshuffling the other positions, which firstn would do and which would corrupt EC chunk ordering.

How do I verify my CRUSH map actually has the domains I think it does?

Run ceph osd crush tree --show-shadow to see the full hierarchy including the per-device-class shadow trees, or ceph osd tree for a simpler host/OSD view. If you're mixing device classes (hdd/ssd/nvme) on the same hosts, make sure your crush rule specifies the device class — otherwise CRUSH may place PGs on the wrong tier.

Can I change a pool's failure domain after creation?

Yes — failure domain lives in the CRUSH rule, not the pool itself. Create a new crush rule at the desired domain level and apply it with ceph osd pool set <pool> crush_rule <new-rule>. This triggers a full data rebalance as PGs move to satisfy the new placement rule, so treat it like any other major topology change — stage it and monitor recovery.