VMware Cloud Community
H_a_r_m_o_n_t
Contributor
Contributor

Power management with SAN.

I have two ESX 3.5 hosts in HA cluster with VMs laying on SAN (2 NetApp filers in active/active HA cluster). Also there is a Win2k3 management host with VC which doesn't depend on SAN. ESX hosts and Win2k3 have PowerChute Network Shutdown installed. Also NetApp can shutdown itself when it sees that APC UPS has X minutes left to stay on battery.

How to combine all this together? How to gracefully shutdown (in event of power failure) and then switch on (when power returns) ESX servers configuration based on SAN?

The main questions are:

1. How to ensure that ESX hosts are shut down before storage? (in event of power loss)

2. How to ensure that storage is initiated before ESX hosts are trying to use it? (in event power return)

Could somebody provide me with scenario?

0 Kudos
8 Replies
Texiwill
Leadership
Leadership

Hello,

Your UPS should contain software to help with this. If not you need to use some of the third party UPS tools available (unfortunately UPSD for Linux is the only one I know about and it will not run within ESX.

APC PowerChute for example can control exactly what you are talking about for power off. You can run this from within your ESX Service Console or on your windows box. It can call a script which you will have to write to gracefully shutdown the servers.

As for power on, you can set when systems will boot on the UPS. I.e. Power Port 1 enables 1st followed by a delay then power port 2, etc.

I know this is all possible with APC equipment, fairly sure about Tripwire equipment, but not sure about any other equipment at the moment. I would check with your vendor to be sure you have the proper daemons available for your UPS.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
H_a_r_m_o_n_t
Contributor
Contributor

I have all these things in place: APC SmartUPS with Network Management Card, PowerChute Network Shutdown on two ESX servers and also on Win2k3 management server. Beside this NetApp as I say also support APC.

Now, could you explain in details the shutdown scenario?

I suppose that NetApp APC support and ESX servers PCNSs are useless in this scenario. Is it true? Should I write custom script to shutdown all datacenter hardware from managemement host?

Concerning power on. I don't have APC InfraStruXure hardware, so I can't switch on firstly Power Port 1, then Power Port 2, etc. Should I use Wake On Lan in this case?

0 Kudos
Texiwill
Leadership
Leadership

Hello,

I have all these things in place: APC SmartUPS with Network Management Card, PowerChute Network Shutdown on two ESX servers and also on Win2k3 management server. Beside this NetApp as I say also support APC.

I suppose that NetApp APC support and ESX servers PCNSs are useless in this scenario. Is it true? Should I write custom script to shutdown all datacenter hardware from managemement host?

In this case you may wish to contact APC as they may have something already, but it looks to me that you need to do the following:

  • shutdown the VMs

  • shutdown ESX

  • shutdown backup server (if using VCB)

  • shutdown NetApp

Concerning power on. I don't have APC InfraStruXure hardware, so I can't switch on firstly Power Port 1, then Power Port 2, etc. Should I use Wake On Lan in this case?

Yes sounds like Wake On LAN may be your best option for this. Or invest in a PDU (APC9212 or some such) that will allow your the control you need.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
H_a_r_m_o_n_t
Contributor
Contributor

In this case you may wish to contact APC as they may have something already, but it looks to me that you need to do the following:

  • shutdown the VMs

  • shutdown ESX

  • shutdown backup server (if using VCB)

  • shutdown NetApp

I have contacted APC but they can't provide me solution for such complex (in their opinion) scenario. I think they aren't familiar with VMware infrastructure as much as you.

1. What API should I use to implement this?

2. From where should I run this script? Management host?

3. I suppose that you forgot the last step "shutdown management host with APC UPS".

Also:

1. Before shutting down NetApp I have to ensure that ESXs are shutdown. How to do this? I can ping ESXs but if they are not available through LAN it doesn't mean that they are shutdown. Maybe they are still performing their shutdown sequence.

2. Same thing with NetApp filers. I can't shutdown management host with UPS if I'm not sure that NetApps have completed their shutdown sequence.

Yes sounds like Wake On LAN may be your best option for this. Or invest in a PDU (APC9212 or some such) that will allow your the control you need.

All the same things as for shutdown. How to ensure that storage is up and running? Just ping is not enought. If I can ping NetApp it doesn't mean that all necessary processes are up, it just mean that NICs are initiated.

0 Kudos
H_a_r_m_o_n_t
Contributor
Contributor

BTW, AP9212 is discontinued. Now AP7920 is in place. But we can't invest $509.00 for each AP7920 just for resolving this issue.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

BTW, AP9212 is discontinued. Now AP7920 is in place. But we can't invest $509.00 for each AP7920 just for resolving this issue.

You need to pick one host that will be your 'power manager'... This host should include access to your NetApp server in some way as well.Set the power manager to autoboot when power comes back, then have a startup script to start things up...... I would write a script that does the following:

Power-off:

1) Using VMware RCLI and saved sessions (or some other method that does not require a username/password be entered or stored within a file somewhere), shutdown all VMs

2) for each vm, Using VMware RCLI verify all VMs are powered off', if not force a power-off instead of being nice about it.

3) for each host, Using VMware RCLI/SDK shutdown the ESX host, wait a predetermined amount of time (you need to time how long it takes to shutdown a host)

4) Verify ESX host is powered off using Ping or some other mechanism --- If you had an AP7920 then you would power off the port

5) Power off the NetApp -- since you have a mounted share from the netapp you can tell when its gone when the share goes away......

6) Power off VC if not a VM

7) Power off Power manager node

Power-on

1) Power on NetApp -- wait for mount/share/etc. to come back from NetApp. Once this happens you know NetApp is up

2) Power on VC/License Manager if not a VM

2) For each ESX server power them on

3) POwer on VMs in appropriate order


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
H_a_r_m_o_n_t
Contributor
Contributor

Power off:

2) for each vm, Using VMware RCLI verify all VMs are powered off', if not force a power-off instead of being nice about it.

Nice to hear that there is a method to ensure that all VMs are shut down before initiation of ESX shutdown. If I know this I can calculate pure ESX shutdown time which is static I hope. So I don't need to mind about constantly changing number of VMs which influence ESX shutdown time.

3) for each host, Using VMware RCLI/SDK shutdown the ESX host, wait a predetermined amount of time (you need to time how long it takes to shutdown a host)

4) Verify ESX host is powered off using Ping or some other mechanism --- If you had an AP7920 then you would power off the port

Unveil these things please. Why do you introduce 4th step? It proccesses situations when something strange happens and ESX host don't shut down within calculated time period. Am I right? What should be done in this case?

5) Power off the NetApp -- since you have a mounted share from the netapp you can tell when its gone when the share goes away......

It is arguable. When share goes away it only means that CIFS service is down, but not that storage has completed it's shutdown sequence. Maybe I should act here in the same way as for ESX and just calculate NetApp shutdown time?

BTW, with the help of what tool can I check that share has gone away? And also how to implement this? By constantly looping and polling share until it has gone away?

7) Power off Power manager node

What about APC UPS shutdown? Is there any way to do this from RCLI?

Power on:

1) Power on NetApp -- wait for mount/share/etc. to come back from NetApp. Once this happens you know NetApp is up

It makes much more sense than 5th step in power-off case. But implementation question stays open as for 5th step.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

4) Verify ESX host is powered off using Ping or some other mechanism --- If you had an AP7920 then you would power off the port

Unveil these things please. Why do you introduce 4th step? It proccesses situations when something strange happens and ESX host don't shut down within calculated time period. Am I right? What should be done in this case?

If processes are in defunct state the shutdown will not take your pre-determined time, it could hang or wait for a timeout which could be quite a long time. However, in this case you have to pull the power so to speak.

5) Power off the NetApp -- since you have a mounted share from the netapp you can tell when its gone when the share goes away......

It is arguable. When share goes away it only means that CIFS service is down, but not that storage has completed it's shutdown sequence. Maybe I should act here in the same way as for ESX and just calculate NetApp shutdown time?

You are correct, but since the share has gone away the data is safe as everything should be synced up. I would wait a pre-determined time as well after CIFS goes down. However, since APC does integrate with NetApp you may want to go through that instead. You will have to somehow initiate a proper shutdown sequence on the NetApp.

BTW, with the help of what tool can I check that share has gone away? And also how to implement this? By constantly looping and polling share until it has gone away?

Well I would not 'constantly poll the share but something like 'net use' may be sufficient. If you look for a file on the share you have to consider the timeout that could happen if the share is gone.

7) Power off Power manager node

What about APC UPS shutdown? Is there any way to do this from RCLI?

No, the RCLI is just for ESX and VMs, not NetApp or the UPS.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos