VMware Cloud Community
pirx2020
Enthusiast
Enthusiast

flapping Mellanox interface on 2 servers after update

Hi,

I performed an image based update on some ESXi hosts. First only ESXi + Vendor AddOns (all went well), later I added HPE SPP Firmware and Driver AddOn. After the FW update the first two ESXi 7.0.3 hosts have a "flapping" vmnic0. But I don't see any errors.

The servers have 2 dual port adapters, each adapter has one port connected with 25G to Cisco ACI switch.

Both Mellanox adapters were updated to fw 26.34.1002 but only one has a flapping port. I downgraded the fw to the one from before (26.33.1048). No change. I rebooted, powercycled the server, I disabled the adapter in the PCI settings of the DL380 server. I installed latest mlnx5 driver.

I'm a bit lost here. If I create a case at HPE I know that I'll not get anything useful in the next 8 weeks with such a problem (read: nothing is completely broken, sw issue....). VMware support will point to HPE. I also contacted my network team, but as this happend on two servers that were updated, I guess its my problem.

On one server I see an Opcode 14 error with mlxinfo, on the other everything is green.

 

pirx2020_0-1675966692668.png

 

 

 

/opt/mellanox/bin/mlxlink -d mt4127_pciconf0

Operational Info
----------------
State                           : Physical LinkUp
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : N/A
Width                           : N/A
FEC                             : N/A
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed (Ext.)       : 0x00000052 (25G,10G,1G)
Supported Cable Speed (Ext.)    : 0x00000052 (25G,10G,1G)

Troubleshooting Info
--------------------
Status Opcode                   : 14
Group Opcode                    : PHY FW
Recommendation                  : Remote faults detected

Tool Information
----------------
Firmware Version                : 26.33.1048
amBER Version                   : 2.08
MFT Version                     : mft 4.22.1.11

 

 

 

 

 

2023-02-09T17:32:04.992Z: [netCorrelator] 637041667us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:32:10.340Z: [netCorrelator] 642390335us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:32:20.794Z: [netCorrelator] 652843127us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:32:24.092Z: [netCorrelator] 656141310us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:32:36.645Z: [netCorrelator] 668694640us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:32:38.393Z: [netCorrelator] 670442752us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:32:52.447Z: [netCorrelator] 684496398us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:33:11.047Z: [netCorrelator] 703096379us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:33:24.051Z: [netCorrelator] 716099474us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:33:26.849Z: [netCorrelator] 718897606us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:33:39.903Z: [netCorrelator] 731951198us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:33:58.453Z: [netCorrelator] 750501201us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:34:11.506Z: [netCorrelator] 763554272us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
2023-02-09T17:34:13.254Z: [netCorrelator] 765302263us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
2023-02-09T17:34:27.260Z: [netCorrelator] 779307581us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down


[root@sdev2837:~] esxcli network nic stats get -n vmnic0
NIC statistics for vmnic0
   Packets received: 0
   Packets sent: 178
   Bytes received: 0
   Bytes sent: 12112
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Multicast packets received: 0
   Broadcast packets received: 0
   Multicast packets sent: 20
   Broadcast packets sent: 158
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

 

 

 

0 Kudos
1 Reply
maksym007
Expert
Expert

Please correct me if I am wrong. Your Melenox Card with 25GB/s has SFP modules, correct? 

Check from the console perspective iLO/iDRAC if there everything is OK. 

Is it a remote location or you still have the possibility to check in Server Room cables and connections? 

It might be SFP Module is over already. I don't know why- but exactly after patching they always starting to make troubles. 

 

0 Kudos