Slowness with HA targets mounted

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Sun Feb 19, 2012 9:38 am

Hi there,

I have CSV and CSVPartner. In any given time, if I have CSV and CSVPartner mounted at the same time for HA mode, my VM would be come sluggish, and doing a copy/paste within the CSV volume, would show that the speed is dropped to like 9MB/sec.
Then if I dropped the CSVPartner, then everything become okay, VM is smooth and copy/paste within the CSV volume became much faster to around 120MB/sec. When I am talking about copy/paste, I am doing coping and pasting within CSV volume.
It doesn't matter whether it's CSV or CSVPartner that I dropped the connection, I can see a speed increase with one target mounted and with two targets, it became slow. Why is that ?

I am doing MPIO, 2 path to each target.

Thank you
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Feb 20, 2012 12:45 pm

Could you please clarify what StarWind edition and build are you using and what NLB policy are you using?

Thank you
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Mon Feb 20, 2012 5:42 pm

Hello,

It's been happening since the 5.6 edition, now I am on the current build 5.8.
I am not using NLB, I am donig MPIO with round robin.

HA failover still works but just with concurrent CSV and CSVPartner target mounted would caused slowness.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Feb 21, 2012 10:21 am

OK, could you please provide us with the detailed network diagram (visio perhaps) with all the IPs and bandwidthes described?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Thu Feb 23, 2012 5:51 pm

network.JPG
network.JPG (66.8 KiB) Viewed 8672 times
Here it is. Everything is 1Gb line.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Feb 24, 2012 11:47 am

Actually we strongly recommend to keep your synchronization channel data link separated from all other networks to avoid it to was overflooded with broadcasts, etc. That could be the reason.
I would also tried to exclude switch since it can lag too.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Fri Feb 24, 2012 3:46 pm

Sync channel data link is in its own VLAN as well. There could be a small possibility that there is some kind of lag within the switch, but I think it is highly unlikely.

If there was a lag somewhere, I would've experienced that with either CSV connection. But my problems lies when I connected both CSV target that's when the speed dropped.
I have no issue with any single connection to any of the iSCSI target, but as soon as I connected both CSV and CSVPartner, the speed just drop from 120MB to 30MB.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Feb 27, 2012 9:19 am

Still there is still such possibility (I`m talking about switch) we need to exclude it. Also I will recommend you to take a look at this topic on our forum:
http://www.starwindsoftware.com/forums/ ... t2293.html

Can you confirm that you have setted up using multiple NICs for Sync data link via StarWind Management Console?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Mon Feb 27, 2012 11:51 pm

Yes Anatoly, remembered back in last year, I went thru the speed test with you on each path.
I already did all the speed tweek as recommended.

I don't have a speed issue, I only have it when both HA targets are mounted.
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Tue Feb 28, 2012 1:35 pm

I suggest you try directly connecting the sync channel, bypassing the switch completely, just to see if that makes any difference at all. You can then establish whether the switch is the problem or not.

Also, if you are doing a file copy with a CSV, make sure that the CSV is currently "owned" by the cluster node that you do the file copy from. Otherwise, the copy happens over SMB between hyper-v servers before going to Starwind. SMB is also used when direct cluster-to-san communication has to be suspended (e.g. backups) or you put the CSV into maintenance mode. Ideally you should have a seperate network for CSV SMB traffic too, otherwise it can steal bandwidth from iSCSI.

BTW, Microsoft's ideal for hyper-v clusters using 1G is... one 1G port for each of the following : management of the cluster node, live migration, csv smb, vms. iSCSI would be on top of this.
oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Tue Feb 28, 2012 5:52 pm

Thanks Ibarra, but I am not sure how likely I am able to do a direct connect for the sync channel, since they are not in the same room. But I've tried with two different switch. CIsco 2960 and HP ProCurve switch. They were the same result.

Thank you for the info on the SMB copy. Before I wasn't aware which cluster the CSV was hosted on, but this time I went to the one that's hosting it, and connected the CSVPartner, and did a copy to the same folder, the result was still the same, speed is at 33MB/sec.

Yes, for my Hyper-V setup, I do have each port dedicate for each purpose. One for management, one for live migration, one for crossover which is for CSV, and 2 for iSCSI.

Anything else that could have possible gone wrong ? When I am talking about slow, it is not just slow copying the file, when two HA target were both connected, all my VMS became slow as well.
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Wed Feb 29, 2012 1:20 pm

Hi Oxyi,

That does sound weird. Do you get the same situation if the Partner target is mounted by itself? E.g. maybe the problem is with that box or communications with it, or maybe it is just when both primary & partner are connected? Also, you get the same issue if you use Failover MPIO policy instead of round robin? If you try failover, try switching which target is active to see if that makes a difference.

Are you using NIC teaming for the sync channel? In my experience, Broadcom teaming is pretty bad, Intel is OK, but in your case will only give you failover, not double bandwidth. I *think* Starwind 5.8 has implemented its own teaming for sync now (which would work kind of like MPIO) - I haven't implemented this, but you might want to give it a try. There is, in my opinion, little point in doing any kind of teaming for sync unless you have redundant switches for your sync as without it the teaming just buys you NIC failover, not switch failover. In my case my starwind boxes have one two port 10G NIC each, and I'm using one port for sync and the other for inititators & heartbeat (I also have heart beat going over another 1G connection). So I don't currently have the option of teaming sync.

I assume you've tested bandwidth over the sync channel? A good way to do this is with the Starwind RAM disk, if you can run that at wire speed over the sync channel then you would know the problem is not with your switches or teaming drivers.

cheers,

Aitor
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Mar 02, 2012 2:41 pm

I would actually try to do like follows:
Stop starwind service) on Storage1 and perform your test with copying. After you will get some numbers start starwind service on it on Storage1 and wait untill the sync is finished. After do the same procedure on Storage2.

I think maybe the problem is in the hardware within one of the servers.

Also synchronization channel should be benchmarked with NTttcp or analogue first and IOmeter after (using RAM disk is recommended for this).
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
oxyi
Posts: 67
Joined: Tue Dec 14, 2010 8:30 pm
Contact:

Fri Mar 02, 2012 3:58 pm

Hello back, I actually typed a long reply, but forum setting actually timed you out and when it asked you to relogin, you lost everything you typed :twisted: :evil:

To make it simple, when both primary and partner were connected= slow. If only just one target, let it be the primary or the partner, the speed = fast.
MPIO policy I did play with that, I went from round robine to least queued and back. My speed with MPIO is good, not sure how that's causing this problem?

Sync channel. Yes, I have it teamed, benchamrked and NTttcp speed was decent, 970Mbs ish if I remembered correctly. I would like to know what role does the sync channel play in my problem.
In my understanding, the sync channel is there to sync the two targets.

So today, if I only have CSV mounted, does that mean it is not syncing data over to CSVPartner ?

Also Anatoly, with your test scenario, that's exactly what I am doing right now, am I missing something ?
I only have the primary CSV target mounted, and everything is good. I can have CSVPartner mounted only, and speed is good. Now I have both mounted, speed went to crap, is your test going to validate something ? Because if it's hardware within one of the server, then I would have a problem when I mounted only either one of the target isn't it ? But I am not, both targets if mounted individually and separately they work flawlessly, it's only when both target were mounted the same time to create the HA effect then I would get that slowness.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Sun Mar 04, 2012 7:53 pm

OK, you have tested the synchronization channel with ntttcp, but have you tested with the IOmeter? Can you please share some results with us if yes?

Also have you tried to change the traffic priority on the HA-device (you can do that by right click on the device and choosing a corresponding option in the drop-box)?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply