Looking for some peer support (5.6 performance problem)

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Wed Mar 16, 2011 5:03 pm

After working with Starwind for weeks I am being told it is my hardware/software that is causing all my performance issues and that Starwind is working fine. While that is most likely correct I am in the typical situation that all vendors are pointing at each other and nobody is willing to get to the bottom of it. Starwind is blaming Intel and Microsoft, Microsoft says it is Starwind's fault, Intel and HP say they don't know who's fault it is but it isn't theirs.

So here is the situation, I get great local disk performance, great NIC performance, but VM's running on the Starwind iSCSI targets perform like garbage. When I run Atto disk benchmark against the iSCSI targets on the small transfer sizes (512Bytes to 8K) I get as low at 25Kb a second.

Here is what I have tried.

-Swapped out the switches (same result)
-Swapped out the NIC's (same result)
-Swapped out the drives (same result)
-Used a local SATA drive instead of the Array controller (same result)
-Used RAM disks (same result)
-Connected the Starwind HA partners to each other, i.e. mounted the .img from one Starwind host onto the other Starwind host and tested (same result)
-Crossed over the CX4 cables to each Starwind host to rule out any switches and performed the previous test (same result)
-Used a completely different set of NIC's and switches, going 1Gig all around instead of 10Gig (same result)


I have a few other Starwind installations that have never performed amazingly well, but they are at least usable.

Here is one of the other installations with average hardware and 1Gb connections…. (ATTO run on the iSCSI target)
iSCSI target (other installation).png
iSCSI target (other installation).png (28.64 KiB) Viewed 12008 times
Here is what I get on my current installation with much better hardware all around…. (ATTO run on an iSCSI target, RAM disk)
1Gig RAM Disk.png
1Gig RAM Disk.png (32.91 KiB) Viewed 12012 times

Anyone out there that can benchmark their setup and let me know what they are getting?

Help would be GREATLY appreciated!

Kurt
kmax
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Wed Mar 16, 2011 5:38 pm

Your original thread inspired me to do some testing with Atto over the past few days.

I had terrible read performance with 512 bytes. That led me to this article:

http://sustainablesharepoint.wordpress. ... ith-iscsi/

As for the results...this is before changing (or adding since it wasn't in the registry) the registry entry:
beforeack.png
beforeack.png (9.25 KiB) Viewed 12005 times
And this is after:
afterack.png
afterack.png (10.06 KiB) Viewed 12004 times
Default is a 200ms delay, the entry removes this and fires it off right away. My understanding is the default waits until the frame fills up (jumbo) to send the request and after 200ms sends it anyways.

The odd thing is if I set queue depth lower than 3 in Atto the performance is OK.

Also, the results are against a RAM disk on a single Starwind node.
mkaishar
Posts: 32
Joined: Mon Mar 01, 2010 8:04 pm

Thu Mar 17, 2011 2:38 am

Starwind 5.6 server: Dell R710, 24GB RAM, MD1220 with 24 600GB SAS 10K in RAID-50, 4 Intel GigE dedicated to SAN
ESXi hosts: Dell R710, 64GB RAM, 2 Intel GigE for SAN, 2 vmkernels per NIC, jumbo frames, MPIO

Guest VM:
W2K8R2 Ent
C:\ is on the VM datastore
D:\ is an iSCSI target through the OS with 512MB WB Cache setup on the volume in Starwind, dedicating 4 vNICS for SAN to support MPIO (although currently only have 2 pNICS) on ESXi host

First test is starwind storage server, tests ran on MD1220 DATA volume
atto_test_md1220.jpg
atto_test_md1220.jpg (108.82 KiB) Viewed 11973 times

Second test is the guest VM on D:\
atto_test_w2k8r2_iscsi.jpg
atto_test_w2k8r2_iscsi.jpg (102.98 KiB) Viewed 11971 times
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Thu Mar 17, 2011 3:13 pm

Thanks a ton, this seemed to do it! I will know more after I load up some VM's this weekend.
kmax wrote:Your original thread inspired me to do some testing with Atto over the past few days.

I had terrible read performance with 512 bytes. That led me to this article:

http://sustainablesharepoint.wordpress. ... ith-iscsi/

As for the results...this is before changing (or adding since it wasn't in the registry) the registry entry:
beforeack.png
And this is after:
afterack.png
Default is a 200ms delay, the entry removes this and fires it off right away. My understanding is the default waits until the frame fills up (jumbo) to send the request and after 200ms sends it anyways.

The odd thing is if I set queue depth lower than 3 in Atto the performance is OK.

Also, the results are against a RAM disk on a single Starwind node.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Mar 18, 2011 8:19 am

Good. So waiting for your final feedback.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
nbarsotti
Posts: 38
Joined: Mon Nov 23, 2009 6:22 pm

Tue Mar 22, 2011 6:05 pm

I am running vSphere ESXi v4.1u1 against my Starwind v5.6 server. I am fight very low IOPS and throughput across iSCSI when using small sized IO. With large IO I get good throughput near the 1GB barrier. If I made the advised registry change on my Starwind server would I also see an improvement?
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Tue Mar 22, 2011 6:28 pm

It has made a huge difference for me. I don't know why it wouldn't be a default advisable change for Hyper-V Cluster users.
nbarsotti
Posts: 38
Joined: Mon Nov 23, 2009 6:22 pm

Tue Mar 22, 2011 6:33 pm

Thanks for the advice. I am a little confused. What type of clients were you connecting to your Starwind server? Were they Hyper-V, ESX, or an traditional Linux or Windows server? From reading the article and other MS KB pages I looked like the setting should be changed on the Hyper-V server. Should it be done on both Starwind and Hyper-V? or just one or the other? Thank you.
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Tue Mar 22, 2011 6:40 pm

I changed it on both my Hyper-V cluster hosts and my Starwind hosts. I think it has more to do with the Microsoft iSCSI initiator than Hyper-V.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Mar 22, 2011 9:04 pm

Would you be so kind to post final numbers? I mean the ones you've got after registry tweaks applied? Thank you!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ggalley
Posts: 9
Joined: Wed Mar 30, 2011 5:21 am

Wed Mar 30, 2011 6:05 am

Hello I work for a large company that is investigating rolling out starwind to replace some of our SAN environment but I cannot recommend starwind 5.6 due to performance issues.
I have also spent several weeks trying to figure out why the performance is all over the place with starwind 5.6.

This one change did help but still does not explain why over ISCSI I lose 50% of my speed on a RevoDrive X2

Internal RevoDrive Performance

Over ISCSI Prior to registry change

Over ISCSI after to registry change


So I am at least now at 50% of expected performance I will be happy if I can get another 30% over ISCSI.

I have tried everything the original poster above has tried and now down to I just want starwind to work with 10Gb a crossover cable.

Software: Starwind 5.6
OS: Windows 2008 R2 SP1
Hardware: Tyan mother board with,dual 8 Core AMD, 48GIG RAM, RevoDrive x2, Dual Intel X520-T2 10Gig.
I have two machines that are identical and I have setup ISCSI connections to a 30Gb RAM disk of and a 30Gb virtual disk on the revodrive.

I have applied all the lastest patches to the OS and intel drivers.
I have removed anti-virus,firewall.
I have ran all the commands found here http://www.starwindsoftware.com/forums/ ... t2293.html
I change the RSS Queues from 2 to 4.
Jumbo frames are enabled.
I have verified the cable can transfer at 10gig using the following command.
Sender: ntttcps -m 1,1,10.x.x.x -l 1048576 -n 100000 -w -v -a 8
Receiver: ntttcpr -m 1,1,10.x.x.x -l 1048576 -rb 2097152 -n 1000000 -w -v -a 8

I changed several of the intel adapter settings to the following
Flow Control: Disabled
Interrupt Moderation Rate: off
Receive Buffers:2048
Transmit Buffers:2048

After not seeing any performance gain I set them back to the default
Flow Control: Enabled
Interrupt Moderation Rate: Adaptive
Receive Buffers:512
Transmit Buffers:512

Suggestions found in this article by intel
http://www.intel.com/support/network/sb/CS-025829.htm

I have been unable to push past 30% utilization of the network using several different tools
ATTO Disk BenchMark.
IOMeter.
Barts Test Stuff.

Where should I look next to find my performance problem if I am not consuming all the NIC bandwith, not using all the IOPS of the revo/ram disk, not even coming close to using all my CPU or RAM?

Any Suggestions or help would be appreciated.

Garrett
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Mar 30, 2011 9:29 am

A couple of questions if you don't mind :)

1) So you have TWO machines one acting as a target and another as initiator? No HA configured?

2) Do you use Write-Back Cache enabled for non-RAM device? Cache size?

3) Tests you've provided are for... virtual disk hosted by revodrive I guess?

4) Jumbo frame size? 4KB? 9KB?

5) Did you try to increase queue depth for your benchmark tools?

6) TCP performance is... "can transfer at 10gig". Number? :)

Thanks!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
nbarsotti
Posts: 38
Joined: Mon Nov 23, 2009 6:22 pm

Wed Mar 30, 2011 1:49 pm

[quote="ggalley"]Hello I work for a large company that is investigating rolling out starwind to replace some of our SAN environment but I cannot recommend starwind 5.6 due to performance issues.
I have also spent several weeks trying to figure out why the performance is all over the place with starwind 5.6. /quote]

After looking at your ATTO benchmarks I am in a very similar situation as yourself and I have also not found a solution. I am running a 14 SSD raid10 array and I very similar numbers to ggalley. I wish I had some advice, but I will be following this thread. To answer anton's questions
1) I have 1 machine no HA
2) I do not use Write-Back cache on my starwind target I use write-through. Cache is 4GB, expire time is 5000ms
4) no jumbo frames since my switch does not support it
6) hard to test 10GB iscsi numbers since the only other host with 10gb NIC is the ESXi host.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Mar 30, 2011 2:51 pm

1) OK, so PAIR of machines. One for target, one for initiator, 10 GbE link between them?

2) Could you PLEASE enable WB at least for experiments?

3) How do you run 10 GbE w/o Jumbo frames?!? What TCP numbers do you have?

4) I'm a little bit lost... Looks like you have VERY different config compared to topic starter.
nbarsotti wrote:
ggalley wrote:Hello I work for a large company that is investigating rolling out starwind to replace some of our SAN environment but I cannot recommend starwind 5.6 due to performance issues.
I have also spent several weeks trying to figure out why the performance is all over the place with starwind 5.6. /quote]

After looking at your ATTO benchmarks I am in a very similar situation as yourself and I have also not found a solution. I am running a 14 SSD raid10 array and I very similar numbers to ggalley. I wish I had some advice, but I will be following this thread. To answer anton's questions
1) I have 1 machine no HA
2) I do not use Write-Back cache on my starwind target I use write-through. Cache is 4GB, expire time is 5000ms
4) no jumbo frames since my switch does not support it
6) hard to test 10GB iscsi numbers since the only other host with 10gb NIC is the ESXi host.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ggalley
Posts: 9
Joined: Wed Mar 30, 2011 5:21 am

Wed Mar 30, 2011 2:57 pm

1) So you have TWO machines one acting as a target and another as initiator? No HA configured?

A)Yes,The two machines are my HA nodes but I have configured each of them to have non HA RAM and Virtual Disk as per your "Benchmarking Server Guide."

http://www.starwindsoftware.com/benchma ... vers-guide

These tests are my efforts to test step 1 of that guide "Following links should be checked 1.Sychronization Link between Starwind 1 and Starwind 2." "Results less than 80% of the links saturation (even if only one) will not be suitable for HA implementation."

Starwind 1 RAM DISK <- 10 Gig Crossover -> Starwind 2 RAM DISK
Starwind 1 Revo->Image File Device <- 10 Gig Crossover -> Starwind 2 Revo->Image File Device

Hope that makes since :)

2) Do you use Write-Back Cache enabled for non-RAM device? Cache size?

A) Yes, Cache Size I left at Cache size: 64 and Expiry period(ms): 5000 as per the video you have of implementing Hyper-V and HA.
This should probably be adjusted now do you have any white papers on caching when dealing with SSD devices? RevoDrive x2 can support up to 120000 IOPS.

3) Tests you've provided are for... virtual disk hosted by revodrive I guess?
This is correct I thought I had labelled each image with a header but maybe only I can see that information.

Following the benchmarking server guide. I created a new virtual disk on the RevoDrive x2.

The first benchmark is of the revodrive’s performance on the local machine.
The second image is the revo performance I was receiving prior to applying the TcpAckFrequency registry fix.
The third image is the revo performance after applying the TcpAckFrequency registry fix.

The TcpAckFrequency fix really should be a part of iscsi jumbo frames 101 :D

4) Jumbo frame size? 4KB? 9KB?

A) 9KB

5) Did you try to increase queue depth for your benchmark tools?

A) As per all the images the Queue depth is set to 4. What would you suggest I set it to?

6) TCP performance is... "can transfer at 10gig". Number?

A) This is a bit ugly below but this is from the sender.

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s) Avg Bytes per Completion
====== =========== ================ ================== ========================

0 95.985 1092437.360 8739.499 1048492.1
Locked