Poor read performance, good write performance

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

sjors
Posts: 17
Joined: Sun Jul 03, 2011 5:16 pm

Sun Jul 03, 2011 5:17 pm

Hi,

I have one Dell PowerEdge R200 server running Win2008 with Starwind 5.6 connected to a Dell PowerConnect 2724 switch (gigabit) using 3 NICs (1 internal ports and one HP NC382T dual port NIC).
The clients are two Win2008R2 Hyper-V servers (Dell PE 2950). The storage LAN is segmented through VLANs.

I'm experiencing extremely poor read times on the mounted iSCSI volumes, but good write performance.

I'm using an MD1000 enclosure with 12x300GB SCSI drives (RAID-5).

When I run IOmeter on the SAN, I get results in the 11,000 IOPS range, but if I do the same on one of the clients on the mounted volume, I get fluctuations between 200 and 350 IOPS.

I have tried tweaking many settings (RSS, Chimney, ECN, etc) but to no avail.

Is there help available for me on this?

Thanks a lot,

George
sjors
Posts: 17
Joined: Sun Jul 03, 2011 5:16 pm

Sun Jul 03, 2011 6:44 pm

*** UPDATE ***

In one of my attempts, I decided to remove all iSCSI connections and re-establish them using only one connection (no MPIO).
I ran some tests and now IOmeter reports around 4000 IOPS.

I'm starting to believe that MPIO is causing problems.

I'll keep researching this.
In the meantime, any tips, hints or other help is greatly appreciated.

George
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Jul 03, 2011 7:58 pm

Do you use Round Robin or Fail Over MPIO policy?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Mon Jul 11, 2011 5:21 pm

Did this ever get resolved?

Thanks,

Kurt
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Jul 11, 2011 10:36 pm

This particular guy either gave up or managed to have everything up and running finally. But we did not ever heard again from him either way. Do you experience the same issues with the latest V5.7 RC or what?
camealy wrote:Did this ever get resolved?

Thanks,

Kurt
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Mon Jul 11, 2011 11:01 pm

Just always interested in improving our lackluster HA performance
Constantin (staff)

Tue Jul 12, 2011 12:35 pm

In this case disabling Delayed ACK can help you.

Code: Select all

HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Tcpip \Parameters \Interfaces \{Adapter-id}]
TcpAckFrequency = 2 (Default=2, 1=Disables delayed ACK)
georgep
Posts: 38
Joined: Thu Mar 24, 2011 1:25 am

Wed Jul 27, 2011 3:28 am

i have the same problem with v5.7 Why do u say to put 2 instead of 1 ? 1 disables it right ?
@ziz (staff)
Posts: 57
Joined: Wed Aug 18, 2010 3:44 pm

Wed Jul 27, 2011 8:40 am

georgep wrote:i have the same problem with v5.7 Why do u say to put 2 instead of 1 ? 1 disables it right ?
Absolutely right, 1 disables it. This modification should improve read performance.
Aziz Keissi
Technical Engineer
StarWind Software
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Jul 27, 2011 8:45 am

Yes. Both initiator and target sides please.
georgep wrote:i have the same problem with v5.7 Why do u say to put 2 instead of 1 ? 1 disables it right ?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
sjors
Posts: 17
Joined: Sun Jul 03, 2011 5:16 pm

Wed Jul 27, 2011 3:31 pm

Hi All,

I did not give up or resolve the issue for that matter.
The problem is still there, although at the time, I managed to stabilize the system somewhat.
(Yes, the problems are surfacing more and more again.)

I kept troubleshooting the storage problems, which led to MPIO being one of the components that were giving the problems.
Even after installing a couple of hotfixes on the system, I still have performance issues.

One of the nodes is even out of service at this time, because it keeps disconnecting the SAN volumes.
I will start the troubleshooting tonight and continue throughout the following days, if necessary.

I will keep you guys posted on the progress.

Regards,
George
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Jul 27, 2011 4:02 pm

Just quick question - have you tested your network with something kinda NTttcp tool or Iperf?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
sjors
Posts: 17
Joined: Sun Jul 03, 2011 5:16 pm

Wed Jul 27, 2011 4:31 pm

I have done NTttcp during a support session, so it should still be there.
If I can remember right, we got satisfactory results on that one, but I shall run it again during nighttime.

As for Iperf, I have not done this one yet.
I can download it and check it out and post the results.

If there are specific tests or methods that I should follow, that info is most welcome.

Regards,
George
sjors
Posts: 17
Joined: Sun Jul 03, 2011 5:16 pm

Thu Jul 28, 2011 3:21 am

*** Update ***

Today, the one node that is active, crashed uglily!
The Cluster Resource failed, bringing the whole system down. I don't know yet what exactly happened.

Since the system was down anyway, I decided to do the Delayed ACK modifications, as indicated in the above post.
I also downloaded Iperf and ran some tests.
On single connections, I got about 300-330 Mbps on each interface.
On a 10 connection parallel test, it jumped to about 850-880 Mbps.

It seems that the network connections themselves are fine.

To recap, I have disabled the TCP Chimney Offload and RSS, as shown below.

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State : disabled
Chimney Offload State : disabled
NetDMA State : enabled
Direct Cache Acess (DCA) : disabled
Receive Window Auto-Tuning Level : normal
Add-On Congestion Control Provider : ctcp
ECN Capability : disabled
RFC 1323 Timestamps : enabled


I'll let the system run for a day or two (if problems don't arise along the way) and report back.

As a matter of fact, I still need to iron out the kinks in the inactive node, so I can experiment on it.

George
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jul 28, 2011 2:02 pm

Numbers suck. You should extract around 950-980 Mbps for a single 1 GbE link. Also please make sure you're running the most recent version of StarWind before you'd be submitting crash reports. Thank you!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply