Solved: What is the best DR solution when the DR is in a d...

Dave_McD · ‎02-08-2011

Our current DR is not suitable so we want to utilise our server room in another city. I am concerned about the impact the distance between the cities and the limited link would have on any DR solution.

I was thinking of fault tolerance but I am concerned about the traffic involved. I have the same concern with Vmotion if required during a DR event.

SRM is an option but would be expensive due to the nrequirement of a other VC license.

Does anyone have any suggestions?

Cyberfed27 · ‎02-14-2011

I dont see how you plan to do this without some type of shared replicated storage.

You cant assume that your primary site will ONLY encounter a loss to the hosts where the storage subsystem mysteriously stays online.

Either you are going to spend BIG money having a reliable and rather robust infastructure to maintain mirror replicas of your storage across locations or you are going to have to decide an acceptible delta value that can exist between your primary and DR sites and (for a lack of a better word) "ship" your images on a regular schedule between locations.

Like someone else said management needs to decide the ROI for this project and determine acceptable risks from there.

View solution in original post

mcowger · ‎02-08-2011

Not sure on the distance limitations for it, but considered a stretched P4000 (LeftHand) cluster?

--Matt VCDX #52 blog.cowger.us

ChrisDearden · ‎02-09-2011

Unless you are prepared to spend some serious cash on your wan links , I think its safe to say you can rule out FT, which requires a GB link between sites.

SRM requires a whole lot more than just an additional VC licence - have you got a replicated SAN in place ?

How many machines are you looking to protect ? you may want to look at a replication based solution like veeam ( assuming you already have a virtual infrastructure at your second site ? ) you could even tackle the issue at the per machine level with somethign like doubletake or neverfail if availability is your key metric.

Once you have a clear idea of your required RPO and RTO and the budget to maintain those , you'll find its a lot easier.

If this post has been useful , please consider awarding points. @chrisdearden http://jfvi.co.uk http://vsoup.net

bulletprooffool · ‎02-09-2011

Dave,

Do you have reploicated storage to the remote site?

What I did in a previous role was to simply use the storage replication for replication of all data between sites. (ie vmdks etc) - and created connections to the relevant datastores on each end (we had an ESX cluster at each end, that had matched portgroups and datastores)

We did not have a stretched VLan so the first hurdle was migration of machoines and what would happen to IP addresses.

Our solution was to create all VMs with static mac addresses and give them DHCP IP addresses. We then simply created a reservation for each VM in each Datacentre's DHCP, so we always knew what IP would be allocated.

Next we simply scripteed the failover. In simple terms, the failover wokrs as follows:

1. Unregister all VMs from source ESX hosts

2. Failover storage to remote site

3. Import VMs to ESX cluster on remote site

the key pieces of the puzzle were

Logging the config of the source cluster to an XML file daily, so that we could know whjich ports groups / resource pools etc needed to mapped
Duplicataion of all PortGroups / Resource Pools etc whenever they were created
daily script to verify that the clusters and ESX hosts at each end wereconsistent

In our case, it was NetApp storage - but anything will work.

I call it DR on a shoestring.

One day I might dig all the scripts etc up and publish a full plan online for anyone hoping to DR on a budget.

Incidentally

RPO - was about 15 minutes (max lag on storage migrations)

RTO was about 15 minutes - so all in, pretty effective

We actually used th eprocess to failover between DCs several times (controlled)

Good luck

One day I will virtualise myself . . .

Dave_McD · ‎02-09-2011

Thanks for your input guys.

The IP address is not an issue as we are able to get a switch in the DR location on the same subnet as the production location, which means we will be able to maintain the same IP address at both locations.

Currently I am investigating vReplicator but I could look at Veeam as well.

We do not have replicated storage bwetween sites but that is definitely something I can look at.

Thanks for the ideas, I can start to get my teeth in to it now.

DSTAVERT · ‎02-09-2011

Work on your plan locally. Have your DR a few feet away and work out the kinks. See what bandwidth any solution or piece of the solution you could implement will take. Spend 90% of your efforts on the core business processes that will get you back up and functional. Document till your fingers hurt.

-- David -- VMware Communities Moderator

bulletprooffool · ‎02-10-2011

The IP adressing will make it much easier.

all you reall yneed to do then is find a wayt to replicate your storage to the remote site . . andyou'll be good.

You could consider using a VM appliance to present your storage (that has the ability to manage replication) - though your live data will of course suffer a minor performance overhead as a sacrifice for this.

If you don;t have budget for replicated storage - look at the appliance marketplace:

http://www.vmware.com/appliances/directory/cat/8163

One day I will virtualise myself . . .

Cyberfed27 · ‎02-14-2011

I dont see how you plan to do this without some type of shared replicated storage.

You cant assume that your primary site will ONLY encounter a loss to the hosts where the storage subsystem mysteriously stays online.

Either you are going to spend BIG money having a reliable and rather robust infastructure to maintain mirror replicas of your storage across locations or you are going to have to decide an acceptible delta value that can exist between your primary and DR sites and (for a lack of a better word) "ship" your images on a regular schedule between locations.

Like someone else said management needs to decide the ROI for this project and determine acceptable risks from there.

All

What is the best DR solution when the DR is in a different city 1000 kms away over a 20 Mb link?