VMware Cloud Community
bebman
Enthusiast
Enthusiast

SRM and IBM SVC will not configure

I had a working environment that I was testing with using SRM on the VC server.  I was able to run test failovers many times without issue and then even ran a true failover without incident.  Later, after the Storage Team had changed the direction of the replicated LUNs from Site B to Site A, I was able to do a failback, also without issue.  Now when I try to restage everything as it was before the failover, I am having all kinds of problems.  First, when I went back to working with the recovery plan I had before, during a test failover from Site A to Site B, I would see the snaps mount in SRM, but there were more snaps than what was expected and the VMs could never be found by SRM to bring up on Site B.  I then decided to remove all plans and protections groups and rebuild - that didn't make a difference.  I then decided to remove and re-add the arrays in the configuration and that is where I am now.  I can't get the arrays to add back in.  I am able to add the array manager information and it finds the array and its ID, but then when I click OK to move on, the task gets to 23% and then throws an error box that reads: "Invalid XML returned from SRA: Failed to get node ReturnCode."  I have attached my SRM log.  I opened a ticket with VMware support, but they point to the SRA and tell me to call IBM.  IBM is slow and methodical and I am trying to get this resolved faster then that.

Here is my config:

VC 4.1 and SRM 4.1 on same server - Windows 2008 SP2

IBM SRA 1.20.10713

IBM SVC 2145

Anyone ever had any problems like this?

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
9 Replies
idle-jam
Immortal
Immortal

i wish i could help but from my recent experience, getting support from both SRA and VMware is awesomely slow. That time it was HP. I could not get much help here as SRM 4.1 was very new then, and i resorted in installation SRM 4.0 with a more stable SRA (previous release). that was my experience tho.

0 Kudos
Hoschi201110141
Enthusiast
Enthusiast

I do not know IBM but the following i would check in your situation with my Hitachi System.

Because it was possible for you to failover to the recoverysite and after reconfiguring the Storagereplication to do a "failback/failover back to site A" i think the configuration in vSphere/SRM are OK.

But what's about the StorageSystem?

1. After the 2nd Failover / Failback have you reconfigured the Storagereplication again?

2. Ask and ask the Storageadmin again: "What's diffrent now?"

3. Check the entire SAN / Zoning Configuration

4. Do i have access to the SAN-Commanddevice (SAN-Layer)? => Maybe the source of your Array-manager problem

5. Somtimes ESX seams to be a little bit confused with his Datastorage-List. If you are in a Testingenvironment... try a rescan... Can also help with mysterious snap_ Datastores

0 Kudos
Cl3gh0rn
Enthusiast
Enthusiast

.

VSP, VTSP, VCP
0 Kudos
mitchellm3
Enthusiast
Enthusiast

What version of SVC code are you running?

0 Kudos
mitchellm3
Enthusiast
Enthusiast

The problem you are having seems oddly familiar to the "perl" issue.  I'd check to see if your path statement is correct.  I found this thread and the person had the same stoppage at 23%.  I know you said it worked before...so it may not be the issue.

http://communities.vmware.com/thread/191626?start=15&tstart=240

0 Kudos
bebman
Enthusiast
Enthusiast

Thank you all for your insights.  I think I have narrowed this down to an OS problem with the VC/SRM server.  I had already talked to the storage team and they were able to reset the replication from the primary disks to the auxillary disks (R1s to R2s), I had found the information on the PERL calls and the path, and that was setup correctly.  I am going to try to install my SRM on a separate server and see if that yields different results.  I will keep this thread updated as I go along.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
bebman
Enthusiast
Enthusiast

So here is what I have done and discovered so far.  When I install SRM on a separate server and connect it all together, everything works fine.  When I remove that separate server and re-install SRM on the original server, the problems arise.  The problem comes from when a directory is created when the SRM service is started.  The directory is vmware-SYSTEM and it is created where ever the system environment variable sets your TEMP directory.  The default on Windows systems is %SystemRoot%\Temp, which for most people is C:\Windows\Temp.  Seems the appropriate permissions are not being setup on the vmware-SYSTEM folder when it is created.  I used some ACL tools to reset all the permissions on the C:\Windows\Temp directory and for inheritance, but still not working.  Now, I am using W2K8 SP2, so the UAC has been fighting me at points, but when I installed SRM on the separate server, it was W2K8 SP2 and I had no problems and the permissions on vmware-SYSTEM as it was created were fine.  I have also noticed, that sometime when I think I get the permissions right and then I stop and start SRM, instead of reusing the current vmware-SYSTEM directory, a different directory is created with the naming convention of vmware-SYSTEM-xxxxxxxxx where the x will equal a series of numbers.  The logs will then show that SRM cannot write to this newer directory and I am basically back where I started.

Does anyone have any thoughts on this, because I think my hair is turning gray.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
bebman
Enthusiast
Enthusiast

So here is what was discovered after having call open with VMware Support and IBM Support.  It seems that the IBM SVC SRA was having an issue with how Windows was treating some of the Java calls being made by the SRA.  The IBM engineer I was working with referred to it as "Application Garbage" - kind of like Windows temp files, but for an app.  Seems this is becoming a bigger issue with Win2008 looking up some Error code 1224 on search engines. The IBM engineer had to get me an updated JAR file to place in the SRA directory and then everything worked fine.  IBM is going to issue and eFix and VMware is going to release a KB early next week.  When I get their links, I will post them on this discussion.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
bebman
Enthusiast
Enthusiast

VMware FINALLY published the KB for the issue that I experienced.   Here it is: http://kb.vmware.com/kb/1033871

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos