Replication never goes over 1Gbit

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

transparent
Posts: 11
Joined: Mon Jun 02, 2014 9:21 pm

Mon Jun 02, 2014 9:24 pm

We have 2 x 2012 R2 Server running v8 (although this issue existing with v6 as well). There is a direct connection between the two systems using IPoIB for sync channel.

When using iperf I can get 8.79Gbit/sec across the link. However when Starwind is performing a fully sync it will get to 1Gbit (around 115MB/sec we see written) and never go above that. It's almost as if its hitting some sort of limit?

Are there settings/changes/tuning I need in order to get about 1Gbit for a full sync across hosts?

Thanks,
Andrew
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jun 03, 2014 7:27 am

Did you play with S/W sync priority set? What latency do you get with your connections? Any chance to see also NTtcp numbers?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
transparent
Posts: 11
Joined: Mon Jun 02, 2014 9:21 pm

Tue Jun 03, 2014 9:07 pm

Yes, I've moved the slider all the way to the left (Faster Sync). The latency between hosts is <1ms using ping test over an extended period. Please see attached screenshots of ntttcp test (was able to get 1130 MB/sec). We are syncing a flat image file and no load on server during sync. What strikes me as odd is that it can hit 1Gbps no problem and pins it there. If it was dipping/rising or was lower then I'd suspect disk, but because it always goes to exactly 1Gbps led me to believe that something was throttling it.
Attachments
screenshot of ntttcp test between hosts
screenshot of ntttcp test between hosts
HV1-HV2_ntttcp_result.png (26.08 KiB) Viewed 7094 times
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Jun 04, 2014 3:15 pm

±50% CPU usage is annoying... Also IPoIB is not fast. OK, I'll ask guys to do a remote session with you to see what's wrong with your config.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
Bohdan (staff)
Staff
Posts: 435
Joined: Wed May 23, 2007 12:58 pm

Wed Jun 04, 2014 3:43 pm

What about the underlaying storage? I mean the location on which StarWind images are stored. Is it capable to show better numbers than 1Gb/s (128MB/s) locally?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Jun 04, 2014 3:54 pm

That`s weird. JFYI in our test lab the Sync channel hit 100% utilization of 10Gigs interfaces.
Can I ask you to create the RAM-based HA and run the speed test again?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
transparent
Posts: 11
Joined: Mon Jun 02, 2014 9:21 pm

Fri Jun 06, 2014 7:18 pm

We have a 6 disk (SATA) RAID 10, and disk test show it's capable of ~340MB/sec Read and ~290MB/sec write (sequential, and I'm assuming an image replication would be mostly sequential? I'm guessing).

That's a good idea with the RAM drive to take disk subsystem out of the equation. I'll give that a try over the weekend and report back.

Andrew
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 06, 2014 9:26 pm

Not really... Sync channel is the same as your write pattern (multiplexed b/c of a multiple data paths). So if you have mostly random writes then your sync traffic is also going to be "pulsating".
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
transparent
Posts: 11
Joined: Mon Jun 02, 2014 9:21 pm

Tue Jun 10, 2014 1:36 am

Yes, I would assume that would be the case with live or 'real-time' I/O that are ongoing while the cluster is 'in-sync'. Is that the case though while doing a full sync with no client I/O active, and the replication priority set to 'faster sync'? I would assume in that case it would be starting at the beginning of the image file (I'm using flat images, not LSFS) and going to the end sequentially?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Jun 13, 2014 12:55 pm

That's a good idea with the RAM drive to take disk subsystem out of the equation. I'll give that a try over the weekend and report back.
Can I ask you if you have any results for us to share?
Is that the case though while doing a full sync with no client I/O active, and the replication priority set to 'faster sync'?
I would assume in that case it would be starting at the beginning of the image file (I'm using flat images, not LSFS) and going to the end sequentially?
Do you meant synchronization as the recovery from one of the nodes failure? It`ll be seq writes on the recipient if so.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
barrysmoke
Posts: 86
Joined: Tue Oct 15, 2013 5:11 pm

Sat Jun 21, 2014 9:23 pm

Having a similar problem with replication times, so I went to create a ram ha device, and replication manager is grayed out.
also, can't post a screenshot to show you...
Could not upload attachment to ./files/4009_6957aff394d20cba3134b92befdaf3af.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Jun 22, 2014 8:15 pm

Do you have some numbers / config to share? Please send what you wanted to attach to support@starwindsoftware.com and I'll ask web team to check forum attachments. Thanks!
barrysmoke wrote:Having a similar problem with replication times, so I went to create a ram ha device, and replication manager is grayed out.
also, can't post a screenshot to show you...
Could not upload attachment to ./files/4009_6957aff394d20cba3134b92befdaf3af.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Nikolay (web team)
Posts: 5
Joined: Wed Jun 18, 2014 8:42 am

Mon Jun 23, 2014 7:31 am

barrysmoke wrote:Having a similar problem with replication times, so I went to create a ram ha device, and replication manager is grayed out.
also, can't post a screenshot to show you...
Could not upload attachment to ./files/4009_6957aff394d20cba3134b92befdaf3af.
Hi barrysmoke,
Please try to upload your file again.
Thanks
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Jun 23, 2014 9:46 pm

...which means "there was an issue with forum and we've fixed that so attachments are working now" :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
barrysmoke
Posts: 86
Joined: Tue Oct 15, 2013 5:11 pm

Tue Jun 24, 2014 11:28 pm

Alex, and Bohdan are helping troubleshoot the issues over the next couple of days.
I was able to determine a 5TB thick sync'd in 2 hours, and ran at 5.9GB/s on the 10GB sync & heartbeat nic.
might be an issue with thin/lsfs

the screenshot was just of the ha option greyed out for ramdisk. Alex said you can't put a ramdisk into an ha sync...
Post Reply