VMware Cloud Community
freejak04
Contributor
Contributor
Jump to solution

Multihost with DroboElite disconnecting during high i/o

We purchased a DroboElite for our QA environment which currently has three ESXi4 hosts. Under heavy loads, the iSCSI performance will begin to degrade rapidly up to the point where the LUN usually gets disconnected from the host. Sometimes, the host will automatically reconnect and other times, a reboot of the DroboElite is required. I've been back and forth with Data Robotics for weeks troubleshooting the issue without any success. I've made the changes to the HB timeout settings in the 'hidden' console as suggested by DR and also tried connecting to two different gigabit switches (dell powerconnect). Nothing has helped thus far.

Does anyone have experience with these units? Any suggested configuration changes I can make?

Thanks!

0 Kudos
73 Replies
DeadBeef
Contributor
Contributor
Jump to solution

Argh, I just stumbled on this thread. I've been having the same trouble... Did they mention to me that I'm not the only one? Of course not... just have my jump through hoop after hoop instead. No wonder they were stalling on sending me a replacement.

Drobo Elite with two nics, each configured for a different ip address in the same subnet (level 1 tech said bad, level 3 tech says okay... best practices pdf file page 8 shows an example with both ip addresses in the same subnet, go figure). MTU 1500 on both. Each cable goes to a different switch.

Switches are daisy chained:

HP Procurve 2848

HP Procurve 2910al 48g

ESXi 4.0 on two Dell 2950's

Same issues... a P-to-V or clone is all it takes to cause the Drobo to cease its iSCSI functionality. Only thing that fixes it is a reboot of the Drobo. Note that all the testing I have been doing is with offline guest files... Simple file I/O stuff.

I even created a second volume, formatted it to NTFS and mounted it on a windows box. After a failed vmware clone on the first Drobo volume, my mounted iSCSI NTFS volume become unresponsive as well.

This is more than a switch issue. I've got a cheap two drive QNAP setup in iSCSI mode and have NO issues with it and my ESXi hosts. I get considerably better performance out of it too.

0 Kudos
StuartCUK
Contributor
Contributor
Jump to solution

I reported my issue to Drobo about 2 weeks ago

They've had me send in some diagnostic reports from the Drobo on 2 occaisons, and they've had me verify the multi path config (surely not an issue when using cross-over cables!?), HBTokenTimout, MTU at both ends, and a few other bits and pieces.

They have confirmed to me that they cannot reproduce the problem in their lab with their DroboElites - but they still haven't confirmed if my Drobo is faulty of if there is some compatibility between the Drobo and/or my network cards and/or my whitebox build of ESX with network cards off the VM HCL.

Thankfully, I'm in the UK and since I reported the problem to the vendor (salesman!) and the manufacturer before the unit hit 1 month old I'm within my rights to get a refund under the "not fit for purpose" clause. I intend to do that next Friday and go and get a competing device from a different manufacturer with a little more "enterprise" feel and more experience under their belt!!

How's everyone else doing?

0 Kudos
golddiggie
Champion
Champion
Jump to solution

Which device are you moving to?

Network Administrator

VMware VCP4

Consider awarding points for "helpful" and/or "correct" answers.

0 Kudos
StuartCUK
Contributor
Contributor
Jump to solution

0 Kudos
golddiggie
Champion
Champion
Jump to solution

I've sent some questions to the manufacturer about the Thecus N8800SAS product... I'm looking to find out if I can install a dual port Gb NIC into the open PCI slot it comes with (giving me four Gb ports on two controllers). While the device only supports up to 8 drives, it comes in at less than half of what that Iomega device is listing at (with 4TB). I'll probably end up using 10k rpm SAS drives if I get it, under a RAID 10 configuration. It appears that will give me the best IOPs/performance (from my own additional research). So for under $5500 (USD) I'll have an iSCSI device with 8 450GB 10k rpm SAS drives giving me about 1.8TB of usable space. Since the Iomega device only gives you SATA drive options, it will be at a lower performance level. Maybe if you fill it with SATA drives, and use RAID 10, you'll be close to the performance I'll have with just eight SAS drives. Capacity isn't as large an issue for me, since I'm using under 1TB now for all my VM's (using thin provisioning on some of the drives, all C drives are thick now). Plus, I could always get a second device and stack it with the primary if needed. Or I could get something else for Tier 2 (or 3) storage, leaving the N8800SAS as Tier 1.

Network Administrator

VMware VCP4

Consider awarding points for "helpful" and/or "correct" answers.

0 Kudos
StuartCUK
Contributor
Contributor
Jump to solution

Did you get an answer on the dual port Gb NIC?

0 Kudos
StuartCUK
Contributor
Contributor
Jump to solution

All, just in the process of transfering my VMs to IOMega Store Centre - so far so good. Same NICs same cables same switches same config. Yes I spent more money but.....

Drobo are being stubbon and want to spend more time troubleshooting. I think a month is enough?! Anyway this will be my vendors problem since I din't buy direct.

Just came across this: http://www.devtrends.com/index.php/using-the-drobopro-with-vmware-esx-and-esxi/ the comments at the bottom are most telling.

I seriously believe VMWare should revoke Drobo Elite's (I cannot speak for other drobo models) VMWare certification. Just google "vmware drobo iscsi" nearly everything you come across will be negative and reporting the same problems we have had here.

0 Kudos
freejak04
Contributor
Contributor
Jump to solution

Hi. I'm the original poster of this thread. It's amazing to see how many people have experienced the same problem we've seen here and get the same annoying dead-end responses from Data Robotics. Has anyone had any luck getting a refund from them? I really wanted to see the DroboElite work so I kept it way past the return period while troubleshooting with DR for weeks. I've resorted to connecting it directly to a linux box via iSCSI and then sharing the EXT3 volumes via NFS with the VMWare hosts. It seems to work ok like this but is extremely slow. $3,000 for this type of performance and support is really disappointing. I wonder if DR reads these boards and if they are doing anything about it.

0 Kudos
StuartCUK
Contributor
Contributor
Jump to solution

Hi freejak

I've seen a couple of apparently DRI employees posting on this forum if not this very thread, I've also had a few e-mails back and forth with 3rd level support and one of the DRI directors.

They are all very nice knowledgeable people but getting a manufacturer to admit there may be a fault is obviously very difficult, and like any company they are reluctant to give money back when they are unable to reproduce problems, but there are obviously "weird things" in my vmkernel logs and drobo diags that have been commented on by a variety of people at Drobo.

I understand that my particular issue may be down to an incompatibility to between Intel NICs (on the VM HCL) and the Drobo (on the VM HCL). My point with Drobo has been if your product doesn't behave with another product and they are both on the same software HCL there ought to be a caveat on the HCL or in your own documentation.

DRI seem to have a very stable test environment but it is based on high end branded servers, Cisco Catalyst switches and I dare say a very very light load (as any test environment generally would be) - not real world and not a full regression tested environment.

All this has to be coupled with a very poor support offering and hardly any post sales support presence in Europe.

IOMega is of course wll established and backed by EMC. 25% price premium but boy, does it fly!!

0 Kudos
venom78
Contributor
Contributor
Jump to solution

freejak,

I bought my Drobo Elite from PCMall, and after numerous contact with DR level 3 tech support. They agreed to refund us completely. I replaced mine with a VessRAID 1840i from Promise Technology. It cost me about $4000, but it didn' have any problem that Drobo Elite had. I received an email from Drobo level 3 support saying that a new firmware is in beta and it should resolve all the problems that we have experience. Since you still have your Drobo Elite, I think you can give it a try when the new firmware comes out.

0 Kudos
aseniuk
Contributor
Contributor
Jump to solution

I purchased mine from an online reseller... don't want to say their name, we are still in negoations from them to return it. Drobo level 3 support was nothing but helpful with the return they stepped in when our reseller wouldn't take it back. I wouldn't recommend Drobos for corporate environments but mine at home is perfectly fine.

We deceided to go off the deep end and purchased an Equallogic PS4000V but to be fair we did have 4 Drobo Elites. All in all I am happy with the PS4000V solid as a rock and worth the investment seeing as we are storing all of our data on it, I figured we should go for something solid.

0 Kudos
DeadBeef
Contributor
Contributor
Jump to solution

Update,

I have tried this beta (1.0.3) and it did indeed fix the lockup issue. I'm still in the process of benchmarking.

0 Kudos
BradMDRI
Contributor
Contributor
Jump to solution

Hi All,

I would like to provide an update to this thread from Data Robotics. Several customers, including this thread, have reported this problem and we have been able to reproduce this bug in our engineering and QA labs. We have been working hard at resolving this issue and we do have a fix for this bug that is currently under test both internally and externally with several customers who have filed formal cases with our tech support organization. So far, the testing has been very positive. We are continuing to test this new version of our DroboElite firmware for this specific issue as well as running through our complete regression suite of tests and we are planning to have a general release of this updated firmware within the next several weeks. I will keep you posted on our progress. I know one person on this thread has the new firmware beta release and testing is going well.

We know this has been a problem for several of you and we appreciate your patience while we resolve the root cause of this issue and get our new release ready for general distribution.

Brad Meyer

Product Manager - DroboElite

0 Kudos
BradMDRI
Contributor
Contributor
Jump to solution

Hi All,

I would like to provide an update to this thread from Data Robotics. Several customers, including this thread, have reported this problem and we have been able to reproduce this bug in our engineering and QA labs. We have been working hard at resolving this issue and we do have a fix for this bug that is currently under test both internally and externally with several customers who have filed formal cases with our tech support organization. So far, the testing has been very positive. We are continuing to test this new version of our DroboElite firmware for this specific issue as well as running through our complete regression suite of tests and we are planning to have a general release of this updated firmware within the next several weeks. I will keep you posted on our progress. I know one person on this thread has the new firmware beta release and testing is going well.

We know this has been a problem for several of you and we appreciate you patience while we resolve the root cause of this issue and get our new release ready for general distribution.

Brad Meyer

Product Manager - DroboElite

0 Kudos
Rumple
Virtuoso
Virtuoso
Jump to solution

While I do not have a Drobo, I am very happy to see an official post in the forums concerning this issue.

It is nice to know that when things go wrong, the company is willing to step up, identify the issue and follow-up accordingly...

0 Kudos
jwcMyEdu
Contributor
Contributor
Jump to solution

What do we have to do to get on the beta testing crew? My Droboelite

sits unused - might as well help test the new firmware....

0 Kudos
freejak04
Contributor
Contributor
Jump to solution

Update,

I have tried this beta (1.0.3) and it did indeed fix the lockup issue. I'm still in the process of benchmarking.

Hi DB. Please let me know what your results are. I just installed and am in the process of copying a few VMs over for testing. Thanks.

0 Kudos
StuartCUK
Contributor
Contributor
Jump to solution

Brad, Mr. Ponder from third level at DRI will confirm 1.0.3 did not resolve my disconnect issue, although I have not had any lock-ups since installing it.

0 Kudos
joshrose26
Contributor
Contributor
Jump to solution

Same here with 1.0.3.

No more lockups, but mine still disconnects during High IO.

This is my company's first SAN, and my first experience with a SAN device.

I originally set the unit up with 2 gigabit switches, 2 vlans, and 2 subnets. WIth a NIC from each host going into each subnet (and switch).

I tried to follow all guides in the white paper precisely.

Data Robotics has provided me with a lot of things to change, and I have made every adjustment they have suggested. In the meantime my company is wondering what the heck is going on, and I'm burning up nights and weekends trying to find an answer.

I know data robotics is a small company and I really wanted to make their product work but I'm at wit's end.

A couple of people here mentioned intel NIC's, any information on that?

0 Kudos
aseniuk
Contributor
Contributor
Jump to solution

I am currently on vacation from June 8 till June 16.

If you need immediate assistance please email bev.p@softworksgroup.ca<mailto:bev.p@softworksgroup.ca> and she will direct the email to the correct person.

You can also phone 780-429-7462.

Thank you.

Adam

0 Kudos