Ceph Usable Capacity Calculator

Raw, used-raw, and usable — the three numbers people confuse. Enter your hardware and protection scheme to see real usable capacity after nearfull headroom and recovery reserve, replication vs erasure coding side by side.

Raw vs Usable

Replication vs EC

Recovery Reserve

Nearfull Aware

Free · No Login

Cluster Hardware

Host Count // total

Capacity/Host (TB) // raw

Protection

Replicated size=2

Replicated size=3

Erasure Coded

Nearfull Ratio // Ceph default 0.85

Recovery Reserve // hold back for re-replication

Reserve 1 host (recommended)

No reserve

Mixed Capacity Hosts? // largest host matters most

No — uniform hosts

Yes — sizes vary

Capacity Rules

Raw:hosts × capacity/host

Usable:(raw − reserve) × nearfull × efficiency

Replicated eff:1 / size

EC eff:k / (k+m)

nearfull:0.85 default — can't run at 100%

backfillfull:0.90 default

full:0.95 default — I/O blocks above this

Recovery reserve:≥ 1 failure domain's raw capacity

Documentation

Health Checks — nearfull/full ↗
OSD_NEARFULL, OSD_BACKFILLFULL, OSD_FULL thresholds
Monitoring a Cluster ↗
ceph df, ceph osd df tree
EC Profile Planner →
Pick the right k+m before sizing capacity

Capacity Results

configure your hardware on the left
and click CALCULATE USABLE CAPACITY
to see raw, used-raw, and usable TB

Usable capacity

—

Raw Capacity

—

Recovery Reserve

—

Protection Efficiency

—

Nearfull Ceiling

—

Protection	Efficiency	Usable (same hardware)

Usable / Raw

—

Raw, Used-Raw, and Usable — Why These Numbers Get Confused

Raw capacity is the sum of every OSD's physical disk size — what the hardware invoice says you bought. Used-raw is how much of that raw capacity is actually consumed once protection overhead is applied — three replicas of a 1GB object consume 3GB of raw capacity. Usable capacity is what's left for new writes after backing out the nearfull safety ceiling and any recovery reserve — the number that actually matters for "how much can I store."

A cluster with 60TB raw and 3x replication doesn't give you 20TB usable — it gives you roughly 14TB once you subtract a recovery reserve and stop short of the nearfull ceiling. Forum threads on Ceph capacity planning are full of people who skipped one of these two deductions and then got paged when the cluster hit HEALTH_WARN nearfull weeks earlier than they expected.

The deductions, explained

Nearfull ratio (0.85 default)

Ceph will not let you plan to fill OSDs to 100%. The default nearfull_ratio is 0.85, backfillfull_ratio 0.90, and full_ratio 0.95 — above full, the cluster blocks writes entirely to avoid OSDs running out of disk mid-write. Usable capacity calculations should target the nearfull ceiling, not 100%.

Recovery reserve

If you fill the cluster right up to the nearfull line with zero spare room, losing a host means Ceph has nowhere to re-replicate that host's data — the cluster can't recover, it just stays degraded. Reserving at least one failure domain's worth of raw capacity keeps recovery possible.

Mixed-capacity hosts

If your hosts aren't uniform, the host you should reserve against is the largest one — losing it frees the least relative space and demands the most relative re-replication capacity elsewhere. Sizing reserve off an average host capacity under-reserves for this case.

Protection efficiency

Replicated size=3 keeps 33% of raw as usable (1/size). A 4+2 erasure-coded pool keeps 66.7% (k/(k+m)) — roughly double the usable capacity from the same raw hardware, at the cost of EC's CPU and latency tradeoffs covered on the EC Profile Planner.

Worked Example

Six hosts, 10TB raw each = 60TB raw. With replicated size=3: reserve one host (10TB), apply 0.85 nearfull, apply 1/3 efficiency — usable ≈ (60−10) × 0.85 × 0.333 ≈ 14.2 TB. The same 60TB raw with EC 4+2 instead: (60−10) × 0.85 × 0.667 ≈ 28.3 TB — roughly double, for the EC tradeoffs covered above.

Frequently Asked Questions

What's the difference between nearfull, backfillfull, and full?

nearfull_ratio (default 0.85) triggers a HEALTH_WARN so you have time to react. backfillfull_ratio (default 0.90) stops Ceph from backfilling more data onto an OSD that's getting close to full, to avoid making things worse during recovery. full_ratio (default 0.95) is the hard stop — OSDs above this reject writes entirely. Plan usable capacity around the nearfull line, not the full line.

Why hold back an entire host's worth of capacity?

When a host fails, every PG that had a copy on that host needs to be re-replicated onto the surviving OSDs. If the cluster is already full right up to the nearfull line, there's no room to do that — the cluster sits degraded (or worse, starts hitting backfillfull and refuses to recover) until you add hardware. Reserving one failure domain's capacity in advance means recovery can actually happen.

How do I check actual usage against these numbers?

Run ceph df for a pool-level breakdown of raw used vs available, and ceph osd df tree to see per-OSD utilization and spot any individual OSD running hotter than the cluster average — the balancer module (see the PG Calculator page) helps even this out.

Does erasure coding really roughly double usable capacity vs size=3?

For a 4+2 profile, yes — 66.7% efficiency vs 33.3% for size=3 is almost exactly 2x. Other EC profiles vary: 8+3 is 72.7% efficiency (about 2.2x vs size=3), while 2+2 is 50% (1.5x vs size=3). Use the EC Profile Planner to compare the exact efficiency for the profile you're considering.