Search
Join the Technical Preview Program
See how NVMe-oF removes iSCSI
bottlenecks in your HCI
The Best Hyperconverged
Infrastructure
(HCI) for Enterprise
ROBO, SMB & Edge
The Best Virtual SAN
for Enterprise ROBO, SMB & Edge

IOPS (Input/Output Operations Per Second): Key Storage Performance Metric

  • April 14, 2026
  • 18 min read
StarWind Director of Product Management. Ivan is an expert in virtualization and storage architecture. With deep knowledge of software-defined storage and data protection, he provides technical leadership in solution design and product strategy. Ivan delivers high-authority insights into modernizing enterprise-scale IT infrastructure and optimizing virtualized ecosystems.
StarWind Director of Product Management. Ivan is an expert in virtualization and storage architecture. With deep knowledge of software-defined storage and data protection, he provides technical leadership in solution design and product strategy. Ivan delivers high-authority insights into modernizing enterprise-scale IT infrastructure and optimizing virtualized ecosystems.

IOPS comes up in almost every storage conversation – from tuning a home lab to sizing database workloads or building enterprise arrays. It tells you how many read and write operations a storage device can handle per second. That number matters when you need to pick the right drives, avoid bottlenecks, or figure out if your current setup can keep up with your workload.

But here is the thing most vendor datasheets will not tell you: IOPS without latency context is a meaningless number. A storage array advertising 100,000 IOPS at 50 ms average latency will feel terrible in production. The same array constrained to 1 ms latency might only deliver 10,000 IOPS – and that is the number you actually care about.

IOPS meaning

 

Difference between IOPS, Throughput, and Latency

Figure 1: Difference between IOPS, Throughput, and Latency

 

IOPS (Input/Output Operations Per Second) is a measurement of how many read and write operations a storage system can perform in one second. It’s one of the most commonly used benchmarks for assessing storage performance.

Every time an application reads a config file, writes a database row, or loads a VM disk block – that is an I/O operation.

The numbers vary wildly depending on the storage type. A 7200 RPM HDD does about 75-100 random IOPS. A SATA SSD handles tens of thousands. An NVMe drive can hit hundreds of thousands or even millions – at least on paper. But raw IOPS figures only tell part of the story, because the block size, queue depth, and latency all change the real-world result.

The formula itself is simple:

IOPS = Total I/O operations / Time (in seconds)

If a drive processes 500 operations in 2 seconds, that is 250 IOPS. But the useful part is not the formula – it is knowing what IOPS number your workload actually needs, and at what latency.

Sequential vs. random I/O

Storage workloads fall into two broad categories. Sequential I/O reads or writes data in order – large contiguous blocks, one after another. Backups, video streaming, and large file copies are sequential. Random I/O jumps around the disk hitting scattered locations – databases, virtual machines, and email servers generate mostly random I/O.

This distinction is important because sequential and random performance are very different numbers for the same device. An HDD might do 150+ MB/s sequential throughput but only 75 random IOPS. An NVMe drive might advertise 1 million IOPS for random 4K reads but the sequential throughput number – measured in GB/s – is a completely different metric. When you see IOPS figures, always check whether they refer to random or sequential workloads, and which block size was used.

How block size affects IOPS

IOPS figures are never absolute. They depend heavily on the I/O block size (also called transfer size) used during the operation. A storage device may be rated for 1 million IOPS at a 4K random read, but that same device will deliver far fewer IOPS when handling 128K or 1MB block sizes, because each individual operation carries more data and takes longer to complete.

As a general rule:

  • Small block sizes (4K–16K): maximize IOPS, typical of database workloads, virtual machine disk I/O, and transactional applications;
  • Large block sizes (128K–1MB): shift the bottleneck from IOPS to throughput (MB/s), typical of video streaming, backups, and large file transfers.

This is why a storage array is not universal. A media-oriented storage array is optimized for throughput and large blocks and will perform poorly for general virtualization. It works both ways. Always match the block size used in benchmarks to the block size your actual workload uses.

IOPS vs Throughput vs Latency

These three metrics describe different dimensions of storage performance, and you need all three to understand how a system will behave under your workload.

  • IOPS counts operations per second. It matters most for workloads with many small, random requests – think database queries, VDI boot storms, or email servers.
  • Throughput (MB/s or GB/s) measures data volume per second. It matters for workloads that move large blocks sequentially – backups, video editing, data warehouse scans.
  • Latency measures how long a single I/O request takes to complete, in milliseconds or microseconds.

Of the three, latency is often the most important for user-facing applications. A storage system can advertise sky-high IOPS, but if each operation takes 10 ms instead of 0.1 ms, your application will feel slow. Real workloads are latency-bound: a single PHP request waiting on MySQL does not benefit from 100,000 available IOPS if each individual read still takes milliseconds to complete.

The metric to watch in production is p99.9 latency – the 99.9th percentile. Average latency hides problems. p99.9 exposes the worst outliers that cause your site to feel sluggish and requests to pile up.

IOPS by drive type: HDD vs. SSD vs. NVMe

The performance gap between drive types is enormous, and it comes down to how each one physically handles I/O requests.

HDDs use spinning platters and a mechanical arm that physically moves to read/write data. That mechanical movement is the bottleneck. A 7200 RPM drive manages about 75-100 random IOPS with 12-13 ms average latency. A 15,000 RPM drive gets to 150-200 IOPS at 7-8 ms. HDDs still make sense for archival storage and sequential workloads where throughput matters more than random IOPS.

SATA/SAS SSDs eliminate the mechanical parts entirely using flash memory. They handle tens of thousands to hundreds of thousands of IOPS with sub-millisecond latency. The limitation is the interface: SATA SSDs connect through AHCI, which supports a single command queue with 32 commands maximum. SAS SSDs improve on this with 256 commands per device.

NVMe SSDs bypass the legacy AHCI/SATA stack entirely and connect over PCIe. The NVMe protocol supports up to 65,535 queues with 65,535 commands each – that is what enables the massive parallelism behind headline IOPS numbers. For workloads that actually generate enough parallel I/O (large database clusters, heavy virtualization), NVMe is the clear winner. For a single-threaded application, the NVMe advantage over a good SATA SSD is much smaller than the spec sheets suggest.

RAID configurations add another variable. RAID 0 stripes data across drives for maximum IOPS but no redundancy. RAID 10 gives you both performance and redundancy. RAID 5 and 6 introduce write penalties from parity calculations – plan for it.

For distributed setups, NVMe-oF (NVMe over Fabrics) extends NVMe performance across the network, avoiding the latency penalty of traditional iSCSI or FC protocols.

Why queue depth is important for benchmarks

Queue depth is the number of I/O requests a storage device processes simultaneously. It has a direct impact on both IOPS numbers and latency – and it is the biggest reason vendor benchmarks often do not match real-world performance.

Most vendor IOPS figures come from tests at high queue depths – QD32, QD64, or even QD256. At those depths, NVMe drives saturate their parallel queues and hit peak IOPS. But most real applications – web servers, small databases, single-user workstations – operate at QD1 to QD4. At QD1, what matters is not parallelism capacity but how fast each individual operation completes. That is latency, not IOPS.

Here is a real life example from usual VPS benchmarking: Provider “X” posted the highest IOPS at QD1, yet delivered the slowest p99.9 latency – 3.5x slower than competitors on reads and 9x slower on writes. The IOPS number looked great, however, the user experience was terrible.

When evaluating storage, ask: at what queue depth were these IOPS measured? If the answer is QD32+ and your workload runs at QD1-4, those numbers are irrelevant to you. Run your own benchmarks at realistic queue depths. A good starting point:

fio –name=randrw –iodepth=1 –rw=randrw –bs=4k –numjobs=1

Then look at the p99.9 completion latency in the output – not the IOPS headline.

How to improve IOPS

If your storage is not keeping up, here are the practical options, roughly ordered from most impactful to least:

Switch to faster media. If you are still on HDDs for random workloads, moving to SSDs is the single biggest improvement you can make. Even a mid-range read-intensive SSD delivers orders of magnitude more IOPS than a 7200 RPM drive. If you are already on SATA SSDs and hitting limits, NVMe is the next step.

Add drives and configure RAID. More drives in a stripe means more parallel I/O paths. RAID 10 is the best general-purpose choice for both performance and redundancy. RAID 5/6 works well with SSDs where the write penalty is less painful than with HDDs.

Use caching. RAM or SSD-based read/write caches accelerate the most frequently accessed data. This is especially effective when your hot dataset fits in cache but your total dataset does not fit on fast storage.

Implement storage tiering. If budget or capacity constraints mean you cannot put everything on flash, automated tiering moves hot data to SSDs and cold data to HDDs transparently. You get flash performance where it matters without paying for all-flash capacity.

Tune your workloads. Align partitions, choose the right filesystem block size, and reduce unnecessary I/O. For HDD arrays, defragmentation helps; for SSDs, it is harmful – skip it. Check that your applications are not generating excessive small writes when batch writes would work.

Scale out. Add more drives or nodes to distribute the workload. This is the most expensive option but sometimes the only one when you have hit the limits of a single system.

Monitor regularly with tools like fio, iometer, or vendor-specific utilities. Focus on latency percentiles (p99, p99.9) under your actual workload, not just peak IOPS under synthetic benchmarks.

What StarWind and DataCore have to offer?

StarWind Virtual SAN (VSAN) uses NVMe-oF to provide low-latency shared storage for ROBO and edge deployments. For core datacenter infrastructure, DataCore SanSymphony includes automated storage tiering, which lets you mix flash and spinning disk in a single pool and automatically place data based on access frequency. For Kubernetes environments, DataCore Pulse8 provides high-performance container-native persistent volumes.

For real-world IOPS and performance comparisons, see:

Conclusion

IOPS is a useful metric but a terrible one to optimize in isolation. A storage system with impressive IOPS at high queue depth might deliver poor latency at the queue depths your workload actually uses. When evaluating storage, test at realistic queue depths, look at p99.9 latency, match your benchmark block sizes to your actual workloads, and remember that throughput matters more than IOPS for sequential workloads. The spec sheet is a starting point, not a guarantee.

Hey! Found Ivan’s article helpful? Looking to deploy a new, easy-to-manage, and cost-effective hyperconverged infrastructure?
Alex Bykovskyi
Alex Bykovskyi StarWind Virtual HCI Appliance Product Manager
Well, we can help you with this one! Building a new hyperconverged environment is a breeze with StarWind Virtual HCI Appliance (VHCA). It’s a complete hyperconverged infrastructure solution that combines hypervisor (vSphere, Hyper-V, Proxmox, or our custom version of KVM), software-defined storage (StarWind VSAN), and streamlined management tools. Interested in diving deeper into VHCA’s capabilities and features? Book your StarWind Virtual HCI Appliance demo today!