Best network setup 3nodes-HA for virtualisation. X540/X520 ?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

JohnTrent
Posts: 6
Joined: Thu Jul 17, 2014 8:53 am

Wed Aug 27, 2014 3:01 pm

robnicholson wrote:Actually, I'm not a big fan of IOPS either as using IOMeter, I was able to get wildly varying values on our existing SAN just by modifying the block size.
As you should, say for example your array's max performance was 300MB/s. This is a maximum of 300*1024*1024 = 314,572,800 B. Therefore if you test with a 256KB (262144 B) block size then you will see 1200 IOPS whereas if you test it with 4KB block size then you will see 76,800 IOPS and again if you test it with 0.5KB block size then you will see 614,400 IOPS. Granted there is some overhead and it generally doesn't scale 100% lineally and depending on workload and current array you may want to test at different block size. For example with SQL server you will likely want to test at either 8KB or 64KB.

Generally speaking most websites/tech blogs/SSD manufactors/array manufactours report IO at a 4KB blocksize which is what CrystalDiskMark uses.
robnicholson wrote: -----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 280.706 MB/s
Sequential Write : 221.522 MB/s
Random Read 512KB : 179.140 MB/s
Random Write 512KB : 130.066 MB/s
Random Read 4KB (QD=1) : 18.603 MB/s [ 4541.8 IOPS]
Random Write 4KB (QD=1) : 12.230 MB/s [ 2985.8 IOPS]
Random Read 4KB (QD=32) : 231.564 MB/s [ 56534.1 IOPS]
Random Write 4KB (QD=32) : 105.039 MB/s [ 25644.2 IOPS]

Test : 1000 MB [E: 0.9% (0.1/10.0 GB)] (x5)
Date : 2014/08/22 14:55:02
OS : Windows Server 2012 Datacenter Edition (Full installation) [6.2 Build 9200] (x64)
Are these numbers from a StarWind cluster? If so what specs i.e. number of HDD, type of HDD (e.g. 1.2TB 10k 2.5 sas), RAM cache size, SSD cache, flat or LSFS?
upgraders wrote:Well still not clear to what you would like me to test. To test the CSV (which would be a true test of the Starwind system) I would need to mount the CSV as a drive and run it. Is that what you want?

Here are the results running on the RAID 0 drive directly for all three nodes. Node 1 and 2 right now are partner Nodes. Node3 is "Dormant" at the moment. It is not being used a a partner Sync. (Some issues I am waiting on with Starwind before connecting) so basically very low load.

Thanks
Jason
I totally agree that just reporting IOPS numbers isn't that helpfful but combined with the information around your specific setup it is most certainly interesting. Ideally it would be inside the CSV as that is where StarWind does it's magic and where you will see the benefits from the additional features such as the RAM cache etc.

I agree that a raw file copy or something like that would be better and more real world. I think it would also be best if you then timed a copy of that copy or a secondary copy of the first as this will help show the benefits of StarWind as well.
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Wed Aug 27, 2014 3:17 pm

Well to clarify my setup (which I *think* I did in a previous post) here are my specs (Of importance to Disk speed)

Dell R710 - Raid card: PERC H700 1GB NVRAM 144GB Ram

2 (300GB SAS 10K Drives RAID 1) for OS
6 (900GB SAS 10K Drives RAID 0) for Starwind

Jason
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 27, 2014 4:08 pm

That's an impressive throughput from the RAID-0 array - 6 x 900GB 10k SAS you said I think?

So how about if you test from a consumer of the SAN - something connected via iSCSI?

Cheers, Rob.
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 27, 2014 4:09 pm

PS. My tests were via a 4 x 1GbE MPIO iSCSI link.
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 27, 2014 4:18 pm

Just to check - the test size was reduced from 4000MB to 1000MB? So we're comparing apples and not oranges ;-)

Here's my results from the StarWind v6 SAN run directly on the SAN itself. Caveat: the server is currently in production so it's busy doing it's normal job - would normally run this at the weekend or late in the evening so extra load is reduced. Drive E: is a 32 x 600GB 15k SAS drives configured in RAID-10 connected via an LSI MegaRAID 9285CV-8e disk controller with 1GB cache RAM.

Cheers, Rob.

-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 791.498 MB/s
Sequential Write : 970.754 MB/s
Random Read 512KB : 89.504 MB/s
Random Write 512KB : 636.030 MB/s
Random Read 4KB (QD=1) : 1.063 MB/s [ 259.5 IOPS]
Random Write 4KB (QD=1) : 31.862 MB/s [ 7778.9 IOPS]
Random Read 4KB (QD=32) : 20.078 MB/s [ 4902.0 IOPS]
Random Write 4KB (QD=32) : 29.374 MB/s [ 7171.4 IOPS]

Test : 1000 MB [E: 76.7% (6855.3/8934.4 GB)] (x5)
Date : 2014/08/27 17:16:40
OS : Windows Server 2012 Server Standard Edition (full installation) [6.2 Build 9200] (x64)
Last edited by robnicholson on Wed Aug 27, 2014 4:26 pm, edited 1 time in total.
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Wed Aug 27, 2014 4:24 pm

Are these numbers from a StarWind cluster? If so what specs i.e. number of HDD, type of HDD (e.g. 1.2TB 10k 2.5 sas), RAM cache size, SSD cache, flat or LSFS?
No - our current production v6 SAN is a single node (hence all my interest in HA going forward). Those tests were run on a Windows 2012 Hyper-V server whereby drive E: is mounted via 4 x 1GbE iSCSI MPIO link. AFAIK, the throughput of the MPIO network alone here is putting a limit of ~400Gbit/s so even though that test I've just run shows the SAN itself might be able to sustain 791MB/s sequential read, we'll never see that at the iSCSI initiator end as our network is the limiting factor.

Cost prevented us implementing 10GbE at the time for the iSCSI network but in general performance is fine - the SAN disks spend most of them time with a queue length of <1.

Cheers, Rob.
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Wed Aug 27, 2014 6:41 pm

How would you run this test on a Starwind cluster? IE you have to have a Drive to test and the Volumes are virtual on the cluster. I hate to assign a drive letter to the volume unless there is no other way.

Jason
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Thu Aug 28, 2014 8:56 am

Yeah, we'd have the same problem with Hyper-V cluster. I mounted a test drive via iSCSI on one of the Hyper-V hosts outside of the cluster for testing.

Cheers, Rob.
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Thu Aug 28, 2014 12:09 pm

Ok I ran the test through Starwind's CSV. Since there are not many sources for how to assign a drive letter to a CSV to test throughput and IOPS (Strategically written to help other find this post in a search engine) Here are the exact steps and the results:


Syntax
subst [drive1: [drive2:]Path]
subst drive1: /d

Open a NON-elevated Command prompt (otherwise the drive letter will only be accessible to an Admin DOS window and not the Windows GUI subset)

Create a test folder as not to pose any risk to the important files in the CSV: C:\ClusterStorage\Volume1\test

Type the following in the command prompt (substituting the Volume you want to connect to

subst T: C:\ClusterStorage\Volume1\test

It is a good practice to remove this drive letter and folder after you are done testing. To remove use this command:

subst T: /d


So here are my results. This is an active production cluster so keep this in mind. Also the tests were run separately and not at the same time

Thanks
Jason



NODE 1 (running about 25 VMs morning time low load)

-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 376.238 MB/s
Sequential Write : 257.509 MB/s
Random Read 512KB : 188.668 MB/s
Random Write 512KB : 178.991 MB/s
Random Read 4KB (QD=1) : 3.285 MB/s [ 801.9 IOPS]
Random Write 4KB (QD=1) : 5.325 MB/s [ 1300.2 IOPS]
Random Read 4KB (QD=32) : 20.585 MB/s [ 5025.7 IOPS]
Random Write 4KB (QD=32) : 25.046 MB/s [ 6114.6 IOPS]

Test : 1000 MB [T: 36.9% (1510.1/4087.0 GB)] (x5)
Date : 2014/08/28 7:55:45
OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)



NODE3 this is the inactive node, however just because it is not actively running any VM roles, does not mean it should be that much faster than the other nodes. So we are taxing the iSCSI, the SYNC and the Actual CSV load.

-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
Crystal Dew World : [ ... ]
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 357.185 MB/s
Sequential Write : 198.162 MB/s
Random Read 512KB : 174.211 MB/s
Random Write 512KB : 154.381 MB/s
Random Read 4KB (QD=1) : 2.822 MB/s [ 689.0 IOPS]
Random Write 4KB (QD=1) : 3.838 MB/s [ 936.9 IOPS]
Random Read 4KB (QD=32) : 21.568 MB/s [ 5265.7 IOPS]
Random Write 4KB (QD=32) : 23.665 MB/s [ 5777.7 IOPS]

Test : 1000 MB [T: 37.0% (1510.2/4087.0 GB)] (x5)
Date : 2014/08/28 8:03:15
OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)


EDIT.. I am adding the Performance Graph from node1 from Starwind during that test. You can see the current load and the "jump" during the test.
Attachments
IOPS-starwind load.jpg
IOPS-starwind load.jpg (69.37 KiB) Viewed 7518 times
User avatar
awedio
Posts: 89
Joined: Sat Sep 04, 2010 5:49 pm

Sun Aug 31, 2014 2:43 am

awedio wrote:
anton (staff) wrote:"Getting Started" is already re-published using V8 as a core and we're working on a Reference Design Guide. Good news: first one will be Dell R720/730 based :)
When should we expect the release of the Reference Design Guide?
Any ETA on the Ref Design Guide?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Sep 04, 2014 4:00 pm

We should have it approximately in a month. Maybe you can share with us what you'd like to see in the doc as well?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Fri Sep 19, 2014 6:14 pm

upgraders wrote:I don't know if this helps you, but I have almost the exact same server 3 Node setup that Anatoly from Starwind helped setup just last week and I am using direct connect. I have two Dual 10GBe Nic's (4 total per server) also two 300GB 10k sas as a RAID 0 for the OS and siz 900BGGB sas in Raid 0 for Starwind About 5TB. H700 Raid card with 1GB cache. I am also using a 12TB RAID 5 Synology SAN/SAS for Backups. My Dell R710s have four 1GB nics two of which are for the HyperV to the external network and the other two are the iSCSI for the Synology. I'm running Windows 2012R2.

Jason
Starwind Network Diagram (1).jpg
Do you have MPIO for the iSCSI with this design? Just trying to understand what your 10Gbe Intel iSCSI is for when you said that your have two 1Gbe ports on each server for iSCSI to the Synology. I have almost the exact same configuration but I'm having problems getting the MPIO setup for the iSCSI. I designed my converged setup with two IPs per 10Gbe port so that the sync and iSCSI share each physical port. Should I make it simpler by dedicating each 10Gbe NIC port to either iSCSI or sync?
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Fri Sep 19, 2014 7:02 pm

You have to have MPIO turn on.

" Just trying to understand what your 10Gbe Intel iSCSI is for when you said that your have two 1Gbe ports on each server for iSCSI to the Synology." did you look at my diagram and few posts back?

I suppose you could call it over kill but I wanted all of my 1GB available for the NAS, but the Throughput for iSCSI is crazy fast using the 10GBe cards. The Synology is for backup, (I had a 2 channel iSCSI from it to a 1GB swtich dedicated and then connected to each Node) So 2 of the 4 1GBe ports were for the iSCSI synology. The other two were for local network. I have since changed the Synology back to a NAS rather than using Iscsi. Mainly I did this because best practices says you should ot use a CSV for backups (I use Hyper-V backups) and they were failing. They said since I was using the Synology for a CSV they would not support the failed backups. All is good now though, I still get 100mbps+ transfers.

I would not cross traffic on the same nic, that's me personally. Starwind guys treat the sync as sacred. not to say you cold not do it, but you are creating a single point failure too. If your nic goes out or something happens you loose both the Sync AND the iscsi connection.

If you only have one 10gbe port (Assuming you have a Dual two port per server) You want the FASTEST running the Sync dedicated. IF you don't have another 10GBe nic (they are only abour $350 on ebay for good Intel brand) then I would look at a 4 port Broadcom (I actually have some that are on ebay right now as that was what I was going to do prior to the 10Gbe for the iSCSI) this way EACH server has two iSCSI ports to help increase throughput (you can team them) 2GBe is better than 1 :) and there is failover.


Hope this helps
Jason
Slingshotz
Posts: 26
Joined: Sat Apr 12, 2014 6:52 am

Fri Sep 19, 2014 8:52 pm

upgraders wrote:You have to have MPIO turn on.

" Just trying to understand what your 10Gbe Intel iSCSI is for when you said that your have two 1Gbe ports on each server for iSCSI to the Synology." did you look at my diagram and few posts back?

I suppose you could call it over kill but I wanted all of my 1GB available for the NAS, but the Throughput for iSCSI is crazy fast using the 10GBe cards. The Synology is for backup, (I had a 2 channel iSCSI from it to a 1GB swtich dedicated and then connected to each Node) So 2 of the 4 1GBe ports were for the iSCSI synology. The other two were for local network. I have since changed the Synology back to a NAS rather than using Iscsi. Mainly I did this because best practices says you should ot use a CSV for backups (I use Hyper-V backups) and they were failing. They said since I was using the Synology for a CSV they would not support the failed backups. All is good now though, I still get 100mbps+ transfers.

I would not cross traffic on the same nic, that's me personally. Starwind guys treat the sync as sacred. not to say you cold not do it, but you are creating a single point failure too. If your nic goes out or something happens you loose both the Sync AND the iscsi connection.

If you only have one 10gbe port (Assuming you have a Dual two port per server) You want the FASTEST running the Sync dedicated. IF you don't have another 10GBe nic (they are only abour $350 on ebay for good Intel brand) then I would look at a 4 port Broadcom (I actually have some that are on ebay right now as that was what I was going to do prior to the 10Gbe for the iSCSI) this way EACH server has two iSCSI ports to help increase throughput (you can team them) 2GBe is better than 1 :) and there is failover.


Hope this helps
Jason
Yes I do have MPIO turned on. I have the same setup as yours with four 10Gbe ports (dual 2 port Gbe NICS) on each server. The reason why I have both sync and iSCSI on each 10Gbe link is exactly for having no single point of failure (I'm basically assigning two IP addresses per 10Gbe port). I might be missing something, but if your 10Gbe fiber on Node 1 goes down, you lose all your sync to Node 1 but the Intel iSCSI stays up correct? In my setup, if the first 10Gbe NIC dies I lose both my sync and iSCSI on that NIC, but my second NIC also has both sync and iSCSI configured so in theory nothing goes down.

This is my configuration that I'm trying to get working:
Capture.JPG
Capture.JPG (72.73 KiB) Viewed 7472 times
upgraders
Posts: 16
Joined: Mon Mar 24, 2014 12:22 pm

Fri Sep 19, 2014 9:09 pm

You are crossing traffic so two separate sync channels and iscsi... hum I dunno I think I'd like to hear Starwind chime in. I would think you would not want to mix traffic or what that would cause (caveats) ? but yes you are right since you have two sync and iscsi and one goes down in that configuration you would not lose, but I am not sure.. perhaps that's why you can't get MIPO working?? if you used nic teaming with one IP so that failover would happen automatically that may be an option. My configuration was drawn up by Starwind a few years back and ok'ed by Anatoly so I would think if there was a better configuration he would have done it that way. Ive been running with this setup for about a month now and I am very pleased.

Jason
Post Reply