Introduction

The new build of StarWind SAN & NAS has just gotten released and it will bring us the long-awaited FC (Fibre Channel) ecosystem support. StarWind SAN & NAS has been designed to give a new life for your existing hardware. Installed either on top of hypervisor of your choice or bare-metal, it turns your server into a fast shared storage pool that can be accessed over iSCSI, SMB or NFS. And in the new build, via FC as well. It uses the same SDS engine as StarWind vSAN which means high performance and also adds new features such as ZFS support to build the utmost resilient storage system using the commodity hardware.

This was a great chance to test how fast StarWind SAN & NAS can go using FC. Folks from StorageReview were kind to provide us with the testing infrastructure where we performed the benchmark. Thanks, StorageReview team once again!

Testing scope

We have tested the performance of shared storage presented from a dedicated storage node full of NVMe drives and StarWind SAN & NAS on top over FC to client nodes. We have decided to include only good old FCP (FCP – Fibre Channel Protocol) benchmark results in this article since the results of NVMe-FC were at the same level (on certain patterns even lower than FCP). To collect NVMe drives into a redundant storage array, we have used MDRAID and GRAID tools and tested them separately. MDRAID is a Linux software RAID that is present as part of StarWind SAN & NAS and serves to collect drives into a redundant array. GRAID is an extremely fast NVMe/NVMeoF RAID card, designed to deliver the full potential of PCIe Gen4 systems.

It is worth mentioning that GRAID SupremeRAID is the only NVMe RAID card as of now capable of delivering the highest SSD performance possible that removes performance bottlenecks altogether. What is the difference, you may wonder? Well, GRAID SupremeRAID SR-1010 is based on an NVIDIA A2000 GPU. In most characteristics, that doesn’t make this solution anything special, but when it comes to the NVMe RAID bottlenecks, the GPU can give a head start to lots of alternatives. In particular, the SupremeRAID is capable of processing all the I/O operations directly, and we don’t need to tell you just how much this frees up the CPU resources. Standard RAID cards are simply no match for the computing potential of the GPU card. Even though the GRAID solution is a software RAID, the NVIDIA GPU card is essential to a lot of benefits that GRAID has to offer. Additionally, thanks to the specifics of GRAID software architecture, data can flow directly from the CPU and straight to the storage, passing by the GRAID card.

Traditionally, NVIDIA cards serve various purposes. They are in demand for use in gaming, video acceleration, cryptocurrency mining, and professional working tools such as VDI. Moreover, NVIDIA also produces GPUs for vehicles. Now? NVIDIA hardware powers storage appliances. This novelty embarks nothing less but a breakthrough in utilizing computing potential of the GPU in a whole new field.

Testing bed

Here is the list of the hardware and software used in our benchmark.

Storage node:

Hardware
sw-sannas-01 Dell PowerEdge R750
CPU Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Sockets 2
Cores/Threads 80/160
RAM 1024Gb
Storage 8x (NVMe) – PBlaze6 6926 12.8TB
GRAID SupremeRAID SR-1010
HBAs 4x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters
Software
StarWind SAN&NAS
Version 1.0.2 (build 2175 – FC)

Client nodes:

Hardware
win-cli-{01..04} PowerEdge R740xd
CPU Intel® Xeon® Gold 6130 Processor 2.10GHz
Sockets 2
Cores/Threads 32/64
RAM 256Gb
HBAs 1x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters
Software
OS Windows Server 2019 Standard Edition

Testbed architecture overview:

Testbed architecture; storage node (4x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters) and client nodes (1x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapter per each one) carried out over 32GFC Fibre Channel fabric and connected using two Brocade G620 Fibre Channel Switches

The communication between storage node and client nodes has been carried out over 32GFC Fibre Channel fabric. The storage node had 4x Marvell® QLogic® 2772 Series Enhanced 32GFC Fibre Channel Adapters while each client node had one. The storage and client nodes were connected using two Brocade G620 Fibre Channel Switches to ensure resilience.

The interesting thing behind Marvell Qlogic 2772 Fibre Channel adapters is that the ports on it are independently resourced which gives an additional layer of resilience.  The complete port-level isolation across the FC controller architecture prevents errors and firmware crashes from propagating across all ports. If you want to find out more about Marvell Qlogic 2772 Fibre Channel adapters in terms of high availability and reliability, you can look it up here.

Unlike with any other company's architecture, with Marvell Qlogic 2772 Fibre Channel adapters, ports can act independently ports act independently from each other which gives an additional layer of resilience.

Marvell QLogic ports act independently from each other giving more flexibility in terms of resilience. More details are here

Storage connection diagram:

Storage connection diagram; dedicated storage node full of NVMe drives (MDRAID/GRAID) and StarWind SAN & NAS on top over FCP (Fibre Channel Protocol) to client nodes (Dell PowerEdge).

We have collected 8 NVMe drives on the storage node in RAID5 array:

StarWind SAN & NAS interface screenshot with an example of a configuration with 8 NVMe drives on the storage node in RAID5 array

First, using MDRAID:

StarWind SAN & NAS interface screenshot with an example of a configuration with 8 NVMe drives on the storage node in RAID5 array using MDRAID

And then, with GRAID correspondingly:

StarWind SAN & NAS interface screenshot with an example of a configuration with 8 NVMe drives on the storage node in RAID5 array using GRAID

Once the RAID arrays were ready, we have sliced them into 32 LUNs, 1TB each. These were distributed by 8 LUNs per client node. This was done since 1 LUN has a performance limitation and we wanted to squeeze max out of our storage.

StarWind SAN & NAS interface screenshot with an example of a configuration with RAID arrays partitioned into 32 LUNs, 1TB each, then distributed by 8 LUNs per client node

This is the example of 8 LUNs connected on one client node:

The Device Manager screenshot with an example of a configuration with 8 LUNs connected on one client node

Testing Methodology

The benchmark was held using the fio utility. fio is a cross-platform, industry-standard benchmark tool used to test local storage as well as shared.

Testing patterns:

  • 4k random read;
  • 4k random write;
  • 4k random read/write 70/30;
  • 64K random read;
  • 64K random write;
  • 1M sequential read;
  • 1M sequential write.

Test duration:

  • Single test duration = 600 seconds;
  • Before starting the write benchmark, storage has been first warmed ups for 2 hours;

Testing stages

  1. Testing single NVMe drive performance to get reference numbers;
  2. Testing MDRAID and GRAID RAID5 arrays performance locally;
  3. Running benchmark remotely from client nodes.

 

1. Testing single NVMe drive performance:

The spreadsheet with the values for NVMe drive speed according to the vendor

The values for NVMe drive speed according to the vendor

Talking about these NVMe SSDs, an interesting thing is that they support 10W~35W flexible power management and 25W power mode by default. Basically, Memblaze’s NVMe drives increase performance on sequential writes by consuming more power, which gives a flexible way to tune drive performance as per specific workload.

We have received the optimal (speed/latency) performance patterns of a single NVMe drive under the following number of jobs and IO depth values:

1 NVMe PBlaze6 D6926 Series 12.8TB
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 16 32 1514000 5914 0,337
4k random write 4 4 537000 2096 0,029
64k random read 4 8 103000 6467 0,308
64k random write 2 2 42000 2626 0,094
1M read 1 4 6576 6576 0,607
1M write 1 2 5393 5393 0,370

Before running the actual tests, we have determined the time needed to warm up these NVMe drives to Steady State:

4k random write performance (IOPs) graph marking the time needed to warm up these NVMe drives to Steady State.

P.S. You can find more information about Performance States here

From the graph, it was visible that the NVMe drives should be warmed up for around 2 hours.

2. Testing MD and GRAID RAID arrays performance locally:

Less words, more numbers. Heading to MDRAID and GRAID local performance tests.

4k random read:

Table result:

MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
4k random read 16 16 2670000 10430 0,095 7% 1281000 5004 0,198 3% 48% 48% 208% 46%
4k random read 16 32 3591000 14027 0,141 10% 2451000 9574 0,207 6% 68% 68% 147% 60%
4k random read 32 32 4049000 15816 0,250 20% 4474000 17477 0,227 10% 110% 110% 91% 50%
4k random read 32 64 4032000 15750 0,504 30% 7393000 28879 0,275 16% 183% 183% 55% 53%
4k random read 64 64 4061000 15863 0,998 40% 10800000 42188 0,377 25% 266% 266% 38% 63%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random read (IOPs) The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random read (latency (ms)) The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random read (CPU usage (%))

4k random write:

Table result:

MDRAID5 GRAID5* Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
4k random write 8 16 376000 1469 0,339 8% 260000 1016 0,491 1% 69% 69% 145% 13%
4k random write 16 16 478000 1867 0,535 17% 432000 1688 0,591 2% 90% 90% 110% 9%
4k random write 16 32 499000 1949 1,026 20% 647000 2527 0,790 2% 130% 130% 77% 10%
4k random write 32 32 504000 1969 2,022 21% 854000 3336 1,197 3% 169% 169% 59% 12%
4k random write 32 64 501000 1957 4,071 21% 975000 3809 2,100 3% 195% 195% 52% 12%

– in order to get maximum performance of 1.5M IOPs with GRAID SR-1010, you need PCIe Gen4x16. Our server however had only Gen4x8 PCIe slots.

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random write (latency (ms))

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random write (IOPs)

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random write (CPU usage (%))

4k random read/write 70/30:

Table result:

MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
4k random read/write 70/30 8 16 765000 2988 0,202 5% 429000 1676 0,344 1% 56% 56% 170% 31%
4k random read/write 70/30 16 16 1078000 4211 0,285 14% 776000 3031 0,382 2% 72% 72% 134% 14%
4k random read/write 70/30 16 32 1100000 4297 0,518 17% 1253000 4895 0,470 3% 114% 114% 91% 18%
4k random read/write 70/30 32 32 1147000 4480 0,960 30% 1944000 7594 0,608 5% 169% 169% 63% 15%
4k random read/write 70/30 32 64 1154000 4508 1,847 30% 2686000 10492 0,882 6% 233% 233% 48% 20%
4k random read/write 70/30 64 64 1193000 4660 5,298 49% 3140000 12266 1,529 8% 263% 263% 29% 15%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random read/write 70/30 (IOPs)

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random read/write 70/30 (latency (ms))

The graph depicting the results of MD and GRAID RAID arrays performance locally: 4k random read/write 70/30 (CPU usage (%))

64k random read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
64k random read 8 8 186000 11625 0,343 5% 175000 10938 0,364 1% 94% 94% 106% 16%
64k random read 8 16 188000 11750 0,679 5% 292000 18250 0,438 2% 155% 155% 65% 30%
64k random read 16 16 196000 12250 1,309 10% 461000 28813 0,554 2% 235% 235% 42% 20%
64k random read 16 32 195000 12188 2,624 10% 646000 40375 0,792 3% 331% 331% 30% 30%
64k random read 32 32 195000 12188 5,242 20% 740000 46250 1,382 3% 379% 379% 26% 15%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 64k random read (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance locally: 64k random read (latency (ms)) The graph depicting the results of MD and GRAID RAID arrays performance locally: 64k random read (CPU usage (%))

64k random write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
64k random write 8 8 92200 5763 0,693 7% 67400 4213 0,948 1% 73% 73% 137% 10%
64k random write 8 16 118000 7375 1,081 14% 104000 6500 1,229 1% 88% 88% 114% 10%
64k random write 16 16 117000 7313 2,179 16% 135000 8438 1,895 2% 115% 115% 87% 11%
64k random write 16 32 117000 7313 4,369 16% 146000 9125 3,496 2% 125% 125% 80% 13%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 64k random read (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance locally: 64k random write (latency (ms)) The graph depicting the results of MD and GRAID RAID arrays performance locally: 64k random write (CPU usage (%))

1M read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
1M read 4 4 10000 10000 1,592 3% 18200 18200 0,880 0% 182% 182% 55% 12%
1M read 8 4 11000 11000 2,673 5% 28600 28600 1,120 1% 260% 260% 42% 10%
1M read 8 8 11900 11900 5,393 5% 39400 39400 1,623 1% 331% 331% 30% 10%
1M read 8 16 12100 12100 10,563 5% 44700 44700 2,865 1% 369% 369% 27% 12%
1M read 16 16 12100 12100 21,156 10% 47000 47000 5,442 1% 388% 388% 26% 6%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 1M read (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance locally: 1M read (latency (ms)) The graph depicting the results of MD and GRAID RAID arrays performance locally: 1M read (CPU usage (%))

1M write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage IOPs MiB\s Latency (ms) CPU usage
1M write 4 4 6938 6938 2,300 9% 5363 5363 2,981 1% 77% 77% 130% 9%
1M write 8 4 6730 6730 4,753 11% 8251 8251 3,876 1% 123% 123% 82% 12%
1M write 8 8 6782 6782 9,434 12% 10100 10100 6,312 2% 149% 149% 67% 17%
1M write 8 16 6780 6780 18,870 12% 11100 11100 11,530 2% 164% 164% 61% 17%
1M write 16 16 7071 7071 36,182 17% 11400 11400 22,490 3% 161% 161% 62% 15%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance locally: 1M write (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance locally: 1M write (latency (ms)) The graph depicting the results of MD and GRAID RAID arrays performance locally: 1M write (CPU usage (%))

Results:

MDRAID shows decent performance on low Numjobs and IOdepth values but as the workload increases, so does the latency and performance stops growing. On the other hand, GRAID gives better results with high Numjobs and IOdepth values: on a 4k random read pattern, we have received the incredible 10,8M IOPs with the latency of just 0,377 ms. That is basically the speed of 7 NVMe drives out of 8.  On large block reads 64k/1M, GRAID reaches the throughput of 40/47GiB/s., while MDRAID reached the ceiling with 12GiB/s.

3. Running benchmark remotely from client nodes:

Once we have received such an impressive local storage results, we were fully ready to give FCP a try and see if it can deliver comparable performance on the client nodes.
In the results below, Numjobs parameter is stated for all 32 LUNs.

4k random read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read 16 16 1664285 6501 0,132 1067226 4169 0,230 64% 64% 174%
4k random read 32 16 3184359 12439 0,141 2104438 8221 0,233 66% 66% 165%
4k random read 64 16 3531393 13795 0,274 3687970 14406 0,264 104% 104% 96%
4k random read 128 16 3544646 13847 0,563 4563635 17827 0,430 129% 129% 76%
4k random read 16 32 1783060 6965 0,199 1772981 6926 0,261 99% 99% 131%
4k random read 32 32 3500411 13674 0,253 3475477 13576 0,268 99% 99% 106%
4k random read 64 32 3532084 13797 0,563 4459783 17421 0,436 126% 126% 77%
4k random read 128 32 3549901 13867 1,139 4578663 17886 0,873 129% 129% 77%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 4k random read (IOPs) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 4k random read (latency (ms))

4k random write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random write 16 16 204612 799 1,241 304228 1188 0,833 149% 149% 67%
4k random write 32 16 238109 930 2,143 513988 2008 0,988 216% 216% 46%
4k random write 64 16 271069 1059 3,769 514719 2011 1,980 190% 190% 53%
4k random write 128 16 331108 1294 6,176 511970 2000 3,991 155% 155% 65%
4k random write 16 32 247398 966 2,059 307504 1201 1,657 124% 124% 80%
4k random write 32 32 285527 1115 3,578 512118 2001 1,992 179% 179% 56%
4k random write 64 32 341017 1332 5,996 491534 1920 4,157 144% 144% 69%
4k random write 128 32 385361 1506 10,617 498065 1946 8,212 129% 129% 77%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 4k random write (IOPs) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 4k random write (latency (ms))

4k random read/write 70/30:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read/write 70/30 16 16 538622 2104 0,683 646787 2 527 0,470 120% 120% 69%
4k random read/write 70/30 32 16 670407 2619 1,136 1109071 4 332 0,554 165% 165% 49%
4k random read/write 70/30 64 16 805986 3149 1,955 1072219 4 188 1,370 133% 133% 70%
4k random read/write 70/30 128 16 927080 3622 3,493 1089414 4 256 2,912 118% 118% 83%
4k random read/write 70/30 16 32 700225 2735 1,065 644987 2 520 1,133 92% 92% 106%
4k random read/write 70/30 32 32 817516 3194 1,928 1103024 4 309 1,329 135% 135% 69%
4k random read/write 70/30 64 32 933090 3645 3,471 1098277 4 290 2,888 118% 118% 83%
4k random read/write 70/30 128 32 997943 3899 6,616 1061938 4 149 6,202 106% 106% 94%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 4k random read/write 70/30 (IOPs) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 4k random read/write 70/30 (latency (ms))

64k random read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
random read 64K 8 8 192015 12001 0,326 149755 9360 0,420 78% 78% 129%
random read 64K 8 16 193967 12123 0,652 260821 16302 0,483 134% 134% 74%
random read 64K 8 32 194089 12131 1,311 397736 24859* 0,634 205% 205% 48%

* – throughput limitation of our FC adapters (3200MB\s * 8 ports = 25600MB\s).

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 64k random read (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 64k random read (latency (ms))

64k random write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
random write 64K 8 8 37343 2334 1,705 61839 3865 1,027 166% 166% 60%
random write 64K 8 16 51048 3191 2,497 100093 6256 1,269 196% 196% 51%
random write 64K 16 16 65517 4095 3,895 132669 8292 1,915 202% 202% 49%
random write 64K 16 32 85255 5330 5,992 138609 8664 3,677 163% 163% 61%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 64k random write (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 64k random write (latency (ms))

1M read:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
1M read 4 2 9690 9690 0,803 8542 8542 0,915 88% 88% 114%
1M read 4 4 10495 10495 1,503 14799 14799 1,059 141% 141% 70%
1M read 4 8 11018 11018 2,874 19841 19841 1,584 180% 180% 55%
1M read 4 16 11713 11713 5,442 25150 25150* 2,520 215% 215% 46%

* – throughput limitation of our FC adapters (3200MB\s * 8 ports = 25600MB\s).

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 1M read (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 1M read (latency (ms))

1M write:

Table result:

      MDRAID5 GRAID5 Comparison
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
1M write 4 2 6028 6028 1,284 2991 2991 2,633 50% 50% 205%
1M write 4 4 7222 7222 2,167 4497 4497 3,509 62% 62% 162%
1M write 4 8 6992 6992 4,521 6748 6748 4,684 96% 97% 104%
1M write 4 16 6819 6819 9,310 8902 8902 7,125 131% 131% 77%
1M write 8 16 7144 7144 17,832 10493 10493 12,117 147% 147% 68%

Graphs:

The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 1M write (MiB\s) The graph depicting the results of MD and GRAID RAID arrays performance remotely from client nodes: 1M write (latency (ms))

Comparing local and remote performance results:

In the tables below, we have provided best results achieved from each test as to performance/latency ratio. The full performance benchmark results are provided above.

MDRAID:

  MDRAID5 – local MDRAID5 – FCP Comparison
Pattern IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read 4049000 15816 0,250 3531393 13795 0,274 87% 87% 110%
4k random write 478000 1867 0,535 341017 1332 5,996 71% 71% 1121%
4k random read/write 70/30 1078000 4211 0,285 927080 3622 3,493 86% 86% 1226%
64k random read 186000 11625 0,343 192015 12001 0,326 103% 103% 95%
64k random write 118000 7375 1,081 85255 5330 5,992 72% 72% 554%
1M read 11900 11900 5,393 11709 11709 5,442 98% 98% 101%
1M write 6938 6938 2,300 7221 7221 2,167 104% 104% 94%

GRAID:

  GRAID5 – local GRAID5 – FCP Comparison
Pattern IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms) IOPs MiB\s Latency (ms)
4k random read 10800000 42188 0,377 4563635 17827 0,430 42% 42% 114%
4k random write 975000 3809 2,100 514719 2011 1,980 53% 53% 94%
4k random read/write 70/30 3140000 12266 1,529 1109071 4332 0,554 35% 35% 36%
64k random read 740000 46250 1,382 397736 24859 0,634 54% 54% 46%
64k random write 135000 8438 1,895 132669 8292 1,915 98% 98% 101%
1M read 47000 47000 5,442 25150 25150* 2,520 54% 54% 46%
1M write 11100 11100 11,530 10493 10493 12,117 95% 95% 105%

* – throughput limitation of our FC adapters (3200MB\s * 8 ports = 25600MB\s).

Conclusions

Essentially, the most impressive shared storage performance was presented by a redundant GRAID storage array full of PBlaze6 6920 Series NVMe SSDs with StarWind SAN & NAS on top and running over Fibre Channel to client nodes, using Marvell Qlogic 2772 Fibre Channel adapters. GRAID is the only technology to guarantee probably the highest performance software-defined shared storage can get as of now. The GRAID build has managed to receive around 50% of the local RAID array performance with the approximately same latency as with the local storage. The only reason the results on 64k/1M large block reads were different is the natural technical limitations of achieving near or at maximum bandwidth speeds for 32G Fibre Channel environment.

Locally, GRAID shows outstanding results with high data values: it was capable of receiving the seemingly impossible number of 10,8M IOPs with the latency of just 0,377 ms on a 4k random read pattern. Also, since GRAID offloads IO requests processing to GPU, the CPU usage on the storage node is 2-10 times lower than that of MDRAID which allows using free CPU resources for other tasks. With MDRAID, we have managed to practically achieve the full performance that the RAID array could provide locally but at a cost of significantly higher latency.

If you want to unleash the full GRAID performance potential, we would advise looking into NVMe-oF and RDMA which will be added in the subsequent StarWind SAN & NAS new builds.  You can find more about the NVMe-oF and StarWind NVMe-oF initiator performance in one of the following articles.

VSAN from StarWind is software-defined storage (SDS) solution created with restricted budgets and maximum output in mind. It pulls close to 100% of IOPS from existing hardware, ensures high uptime and fault tolerance starting with just two nodes. StarWind VSAN is hypervisor and hardware agnostic, allowing you to forget about hardware restrictions and crazy expensive physical shared storage.

Build your infrastructure with off-the-shelf hardware, scale however you like, increase return on investment (ROI) and enjoy Enterprise-grade virtualization features and benefits at SMB price today!

Back to blog