Startup/shutdown takes hours with 2x 2TB HA LSFS (v8.0.7929)

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
Groer
Posts: 3
Joined: Fri Jun 12, 2015 3:49 pm

Fri Jun 12, 2015 4:18 pm

We run a hyper-converged Virtual SAN (v8.0.7929) on 2 SuperMicro X9DRD-7LN4F nodes with Win 2012 R2. Each has 128 GB RAM, 5x 4TB Seagate ST4000NM0023, 1x Areca 1882IX-24 Controller.

For now, we use only 2 vSAN HA images with LSFS, each has 2TB and 16 GB L1 cache.

After a clean shutdown and startup of the Starwind service on the 2nd node, creation of devices takes several hours (!), and re-synchronisation after that never finishes, even after 2 days. Today I tried to shutdown the service again, and now it's in "stopping" state for over 6 hours but still not stopped.

Is build 7929 the latest one? Maybe there is a newer beta version we could try? Thank you!
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Wed Jun 17, 2015 1:19 pm

Hi Groer.

I've noticed, there was an update a couple of days ago. Just update to the latest build and try again. Some issues with LSFS have been fixed definitely (for me at least). Hope, your's too.
epalombizio
Posts: 67
Joined: Wed Oct 27, 2010 3:40 pm

Thu Jun 18, 2015 7:13 pm

2nd the previous post, I've noticed huge increases in service status change times when using L1 WB cache.

I haven't seen the issue in newer builds, but I'm using WT cache now instead of WB since it added quite a bit of time to HA device synchronization. Since the majority of my latency is on the read side that the write side, there is no benefit for me to use WB.

Cheers,
E
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Mon Jun 22, 2015 1:14 pm

In my environment WB performs faster than WT. It's still a subjective impression because I didn't test it really thoroughly since i am really happy with the default configuration so far :D
epalombizio
Posts: 67
Joined: Wed Oct 27, 2010 3:40 pm

Mon Jun 22, 2015 1:17 pm

Darklight,
What version are you running, v6 or v8? WB might have been fixed in the newest build, but I experience mostly read latency vs write latency, so WT works in my scenario.

E
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Fri Jun 26, 2015 10:10 am

I am using the latest build (always update when a new one arrives). Sometimes i also face problems that one of the node does not go down smooth, but then i just wait until it comes up and normalizes and then proceed with updating the other ones.
Didn't play much with WB and WT cache really. I have configured it by default (write-back) and really happy with it so far.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jul 07, 2015 11:13 am

Just curious what's the end of the story? Did OP manage to have issue solved (with or w/out StarWind engineers)?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Groer
Posts: 3
Joined: Fri Jun 12, 2015 3:49 pm

Tue Jul 07, 2015 11:43 am

Hi Anton,

1) Build 8116 seems to handle LSFS volumes better i.e. the sync seems to go faster, and garbage collection seems to work, too.

2) Areca has problems with SSDs and HDDs on the same controller. Once we removed the SSDs completely, performance of the HDD arrays improved.

With these 2 changes, shutdown takes about 2 mins, mounting all volumes ("creation") after a restart takes about 40 to 50 mins, resync afterwards takes about 30 to 40 mins. In total, a restart of both servers takes about 3 hours. That's certainly not superfast :) but at least consistent, and we are happy that resync has finished after every restart since.
Vladislav (Staff)
Staff
Posts: 180
Joined: Fri Feb 27, 2015 4:31 pm

Fri Jul 10, 2015 11:40 am

Hello Groer,

And what about performance of RAID array compared to LSFS on top of it?
Groer
Posts: 3
Joined: Fri Jun 12, 2015 3:49 pm

Fri Jul 10, 2015 4:36 pm

Hi,

Performance varies, of course, but with StarWind RAM cache u. LSFS it's comparable to the native Windows volume. Here are some numbers running MS SQLIO v1.5.SG with 32 threads writing for 30 secs to a 1 GB file using 64KB random IOs, multiple I/Os per thread with 1 outstanding:

Windows volume where LSFS resides: 1937 IOPS (121 MB/s)
LSFS on this volume: 3986 IOPS (249 MB/s)
Hyper-V VM on this LSFS device: 1840 IOPS (115 MB/s)
bubu
Posts: 8
Joined: Mon Jun 08, 2015 12:49 pm

Mon Jul 20, 2015 1:47 pm

So, are you satisfied with this result? :)
Vladislav (Staff)
Staff
Posts: 180
Joined: Fri Feb 27, 2015 4:31 pm

Mon Jul 27, 2015 1:46 pm

Yes, Groer

Are you satisfied? :D

It looks ok from my perspective.
Post Reply