74 Responses

  1. Mark Hodges
    Mark Hodges at |

    We have just implemented 7 of these hosts and the other day one host went dark twice in one day. The host completely lost all connectivity on both nics as the same time (which is unfortunate when you run all the networking and storage across them.

    I really hope the HP 930 firmware plus these latest drivers resolve the issue.

    Reply
  2. Chris Brennan
    Chris Brennan at |

    Hi, We had the exact same isolation issue as well with HP NC375T PCI Express cards two different servers on the weekend. Broadcom cards were unaffected. Driver versions of the NC375T cards as below

    [[email protected] log]# ethtool -i vmnic8
    driver: nx_nic
    version: 4.0.550.1-1vmw
    firmware-version: 4.0.534
    bus-info: 0000:0e:00.0

    Reply
  3. Mark Hodges
    Mark Hodges at |

    Our big problem is that on our hosts we are putting ALL traffic across them so when it drops…well, everything hits the dirt including the VM’s….

    Reply
  4. VMLOST
    VMLOST at |

    Chris,
    I know its a little late in responding to your post, however..I am having an issue (maybe) similar to what is happening above. I have a Dl380 G7 series in place with 2 4 port nx_nics. How when trying to configure Jumbo Frames, I keep getting the following error:” VMkernel failed to set the MTU value 9000 on the uplink vmnic10″ I can create a vswitch and Port group and set it to 9000, however I can’t get the NX_Nics to attach…I have broadcom onboard nics and they work perfectly. I have also followed the post above and it stil hasn’t helped..any ideas?

    thanks in advance

    Reply
  5. Vaughn
    Vaughn at |

    I just called HP Tech and they have a known issue with the NC375T and not being allowed to enable Jumbo frames with ESXi 4.X.

    HP Solution:is downgrade firmware to 4.0.544 as a work around.

    I have not tested this!

    Reply
  6. Oliver
    Oliver at |

    I lost the four LOMs in a DL580 G7 last week, even with current firmware and driver in ESXi 4.1. Thos whole nx_nix thing seems like a disaster.

    Reply
  7. VMdude
    VMdude at |

    Hi,

    I too am having the same issue with configuring Jumbo Frames (want one NC375T nic for iSCSI). I have updated to the latest firmware as of Nov 22 2011, which is 4.0.579
    HP advised to downgrade to 4.0.544, but there is an advisory regarding the unresponsiveness/stability issue that states that the issue is resolved in firmware 4.0.556 and later.

    So it seems to be a catch 22 … either have stability + no jumbo frames, or jumbo frames + no stability

    What a mess, I am waiting for more updates from HP. Probably time to return these cards I think.

    Reply
  8. Mark Hodges
    Mark Hodges at |

    We completely replaced all our HP dual port nic’s with the Intel X520-SR1 (E10G41BFSR) single port cards at a cost of $12000 (since we only ever used a single port anyhow) and since that time we have not had a single incident or issue with our esx farm dropping.

    I’d call that money well spent because I can sleep at night now.

    Reply
  9. Quiet Before the Storm: HP DISCOVER in Vienna « Wahl Network

    […] I’ve used HP from a server capacity, as evident from some of my posts complaining about the nx_nic issue, and am familiar with the BladeSystem, Matrix (MOE), CloudSystem, and Cloud Maps concepts (of which […]

  10. Ken
    Ken at |

    We have 8 DL580 G7s.
    Each hosts has the NC375i NC375T (2 x 4 ports)

    When we started we experienced the Host Isolation issue.
    Firmware at the time was 4.0.554 and 4.0.554
    Default Driver.

    We upgraded to 4.0.579 firmware using a Windows 2008 R2 installed on a hard drive and the online firmware updater. We also loaded 4.0.594 driver.
    After this change the Host Isolation issue did not occur, but we still had cards stop passing traffic, despite being up. In this condition no VLANs shown in Network Card configuration, but the cards showed 1000/Full. This was a huge problem as VMs were not moved when they needed to be.
    At this point we were using only Link Status as the Network Failover Detection method.

    Sooo…we switched to Beacon Probing for everything. This allowed us to identify network outages and have VMs be moved to functioning network adapters. We also noticed that with firmware 4.0.579 we no longer had both cards fail at the same time.

    We also upgraded to 4.0.598 driver using the vihostupdate.pl command.
    vihostupdate.pl –server –install –bundle C:Downloadsoffline-bundle602ntx-nx_nic-4.0.602-164009-offline_bundle-509624.zip

    After loading the 4.0.598 driver (still using firmware 4.0.579) we had better overall stability. Although we still would get notices from vCenter like this:
    [VMware vCenter – Alarm Network uplink redundancy lost] Lost uplink redundancy on virtual switch “vSwitch0″. Physical NIC vmnic0 is down. Affected portgroups:”vMotion”, “Management Network”.

    Sometimes the alerts would not even show up in vCenter the outage was so brief. VM connectivity seemed pretty good, although we did have several occasions where a host had slow network access and vMotion them to another host resolved the issue.

    Yesterday we began loading the 4.0.602 nx_nic driver from this package.
    “VMware ESX/ESXi 4.x Driver CD for Qlogic for NIC Driver for NetXen HP NC522SFP (P3) Ethernet Devices”
    It loaded successfully on all of our Hosts, although it has not been long enough yet to know if this finally fixes the remaining up/down alerts we experience.
    With the 4.0.598 driver those alerts did not start for several weeks, but then occurred every couple of days or more per hosts.

    We are strongly leaning toward replacing all 8 ports per host with Intel cards if this driver does not resolve the issue.

    This has been a very trying process and I loathe the person who ordered the hardware. They thought “having the same card will lead to less chance of problem due to driver conflicts”… too bad it also leads to a single point of failure when the driver is the conflict.

    I hope this helps others as we have invested far more time than we should have to considering we are using hardware on the HCL from name brand vendors.

    Reply
  11. Mark Hodges
    Mark Hodges at |

    We are now getting onwards of 2+ months without a single issue since I put in the X520 single port cards. Those HP cards are now all sitting in a drawer.

    Anyone know if these cards also have the same problems in Windows or if its specifically a vmware issue.

    Personally I suspect that having the dual ports is causing the cards to overheat (since one of the HP advisories recommended turning up cooling and not puttting cards into specific slots) and the X520’s seem to run much cooler…

    Reply
    1. Daniel
      Daniel at |

      Do you have a link to this advisory? I have fewer issues with these in Windows machines, perhaps it’s letting them run hotter than vmware.

      Reply
  12. Ken
    Ken at |

    Well, we have had 3 alerts since we upgraded to 4.0.602.
    This driver did not resolve the issue.
    Still using 4.0.579 firmware (the latest available that I know of).

    The NC375 cards are quad (4) ports per card.
    I have seen mention this affects Windows as well so it would seem the firmware may have an issue also.

    Reply
  13. Simon Ling
    Simon Ling at |

    We have the NC375T 4port running on a ubuntu server, firmware 4.0.550 and I can’t set the speed to gigabit without the whole card dropping off the network until I restart the networking. ethtool reports that it has changed the settings but afterwards there no activity, even the connection light on the switch dies.

    Reply
    1. Simon Ling
      Simon Ling at |

      Please ignore my post as I have just discovered that the problem was in fact the switch is not actually gigabit, although that wasn’t specified on it’s specification only on the blurb! Important life lesson for me there.

      Reply
  14. Christian
    Christian at |

    DL380’s with NC522SFP 10G using 4.0.602 drivers. Still has random disconnects and drops network traffic.

    Anyone have thoughts on the QLE8242?

    Reply
  15. Brian
    Brian at |

    My shop have suffered for this issue for few times already…
    This is a serious problem of those qlogic cards!!!!
    below is the message log just capture from one of the DL585 G7
    I have both 375i (onboard) and 2 x 375T add on card
    this firmware affect all cards in the server.

    Jan 26 14:29:30 cimslp: Found 46 profiles in namespace interop
    Jan 26 14:52:38 vmkernel: 10:10:02:53.215 cpu19:4519)nx_nic[vmnic6]: Device is DOWN. Fail count[8]
    Jan 26 14:52:38 vmkernel: 10:10:02:53.215 cpu19:4519)nx_nic[vmnic6]: Firmware hang detected. Severity code=0 Peg number=0 Error code=0 Return address=0
    Jan 26 14:52:38 vmkernel: 10:10:02:53.365 cpu19:4519)IDT: 1565: 0x99
    Jan 26 14:52:38 vmkernel: 10:10:02:53.365 cpu19:4519)IDT: 1634:
    Jan 26 14:52:39 vmkernel: 10:10:02:53.367 cpu19:4519)IDT: 1565: 0xa1
    Jan 26 14:52:39 vmkernel: 10:10:02:53.367 cpu19:4519)IDT: 1634:
    Jan 26 14:52:39 vmkernel: 10:10:02:53.368 cpu19:4519)IDT: 1565: 0xa9
    Jan 26 14:52:39 vmkernel: 10:10:02:53.368 cpu19:4519)IDT: 1634:
    Jan 26 14:52:39 vmkernel: 10:10:02:53.369 cpu19:4519)IDT: 1565: 0xb1
    Jan 26 14:52:39 vmkernel: 10:10:02:53.369 cpu19:4519)IDT: 1634:
    Jan 26 14:52:39 vmkernel: 10:10:02:53.370 cpu19:4519)nx_nic[vmnic6]: Load stored FW
    Jan 26 14:52:44 vmkernel: 10:10:02:58.549 cpu19:4519)nx_nic: Loading firmware from file , version = 4.0.579
    Jan 26 14:52:44 vmkernel: 10:10:02:58.783 cpu19:4519)VMK_PCI: 746: device 000:071:00.0 capType 16 capIndex 192
    Jan 26 14:52:44 vmkernel: 10:10:02:58.783 cpu19:4519)nx_nic: Gen2 strapping detected
    Jan 26 14:52:46 vobd: Jan 26 14:52:46.182: 900180598394us: [vob.net.pg.uplink.transition.down] Uplink: vmnic6 is down. Affected portgroup: Management Network. 1 uplinks up. Failed criteria: 130.
    Jan 26 14:52:46 vobd: Jan 26 14:52:46.182: 900180598509us: [vob.net.vmnic.linkstate.down] vmnic vmnic6 linkstate down.
    Jan 26 14:52:46 vmkernel: 10:10:03:00.316 cpu21:4519)VMK_PCI: 746: device 000:071:00.0 capType 16 capIndex 192
    Jan 26 14:52:46 vmkernel: 10:10:03:00.354 cpu21:4519)nx_nic NetXen NX3031 Quad Port Gigabit Server Adapter Board S/N TI19BK0822 Chip id 0x1
    Jan 26 14:52:46 vmkernel: 10:10:03:00.400 cpu21:4519)IDT: 1036: 0x99 exclusive (entropy source), flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.400 cpu21:4519)VMK_VECTOR: 143: Added handler for vector 153, flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.400 cpu21:4519)IDT: 1133: 0x99 for vmkernel
    Jan 26 14:52:46 vmkernel: 10:10:03:00.401 cpu21:4519)VMK_VECTOR: 231: vector 153 enabled
    Jan 26 14:52:46 vmkernel: 10:10:03:00.478 cpu21:4519)nx_nic NetXen NX3031 Quad Port Gigabit Server Adapter Board S/N TI19BK0822 Chip id 0x1
    Jan 26 14:52:46 vmkernel: 10:10:03:00.595 cpu21:4519)IDT: 1036: 0xa1 exclusive (entropy source), flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.595 cpu21:4519)VMK_VECTOR: 143: Added handler for vector 161, flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.595 cpu21:4519)IDT: 1133: 0xa1 for vmkernel
    Jan 26 14:52:46 vmkernel: 10:10:03:00.595 cpu21:4519)VMK_VECTOR: 231: vector 161 enabled
    Jan 26 14:52:46 vmkernel: 10:10:03:00.673 cpu21:4519)nx_nic NetXen NX3031 Quad Port Gigabit Server Adapter Board S/N TI19BK0822 Chip id 0x1
    Jan 26 14:52:46 vmkernel: 10:10:03:00.772 cpu21:4519)IDT: 1036: 0xa9 exclusive (entropy source), flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.772 cpu21:4519)VMK_VECTOR: 143: Added handler for vector 169, flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.772 cpu21:4519)IDT: 1133: 0xa9 for vmkernel
    Jan 26 14:52:46 vmkernel: 10:10:03:00.772 cpu21:4519)VMK_VECTOR: 231: vector 169 enabled
    Jan 26 14:52:46 vmkernel: 10:10:03:00.832 cpu21:4519)nx_nic NetXen NX3031 Quad Port Gigabit Server Adapter Board S/N TI19BK0822 Chip id 0x1
    Jan 26 14:52:46 vmkernel: 10:10:03:00.878 cpu21:4519)IDT: 1036: 0xb1 exclusive (entropy source), flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.878 cpu21:4519)VMK_VECTOR: 143: Added handler for vector 177, flags 0x10
    Jan 26 14:52:46 vmkernel: 10:10:03:00.878 cpu21:4519)IDT: 1133: 0xb1 for vmkernel
    Jan 26 14:52:46 vmkernel: 10:10:03:00.878 cpu21:4519)VMK_VECTOR: 231: vector 177 enabled
    Jan 26 14:52:46 vobd: Jan 26 14:52:46.683: 900181099377us: [vob.net.vmnic.linkstate.up] vmnic vmnic6 linkstate up.
    Jan 26 14:52:47 vobd: Jan 26 14:52:47.684: 900179885774us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic6 linkstate is up.
    Jan 26 14:52:48 vobd: Jan 26 14:52:48.685: 900180886863us: [esx.problem.net.redundancy.lost] Lost uplink redundancy on virtual switch “vSwitch0”. Physical NIC vmnic6 is down. Affected port groups: “Management

    Network”.
    Jan 26 14:52:50 vmkernel: 10:10:03:04.481 cpu7:4103)nx_nic[vmnic6]: NIC Link is up

    Reply
  16. Brian
    Brian at |

    Actually I have 2 523SFP 10Gb card on that server too with latest firmware 4.8.2, I am afraid that qlogics’ firmware still have problem too.

    Reply
  17. Lutz
    Lutz at |

    HI, just for clarification: the NC375 / NC375i and 522STP 10G use NetXen-basec Chipsets, which are causing the Issues. But the 523SFP should use a different “Non-NetXen” QLogic-Chip or am I wrong here ?

    We are looking for 10 GbE NICs and HP doesn’t seem to have 10 GbE Intel-branded Adapters. Our current plan was to use the 523SFP (other Options are 550SFP or 552SFP with Emulex-Chips).

    Has anyone tried to get the cards replaced by HP (which are unfortunaletly used as onboard-Nics on the DL370 G6 we currently use) ?

    For Statistics: we are using NC375i on Windows Server 2008 R2 with DataCore on Top and see occational iSCSI-Disconnects for about 5 secs about once every week (Fw.4.0.534), however we see no Packet Errors in Statistics so I suspected the problem somewhere else.

    Reply
  18. Oliver Antwerpen
    Oliver Antwerpen at |

    Hi,
    You are right. The NC523 has is ql_nic, not nx_nic. I have several boxes with NC522/NC375 – all causing issues in ESX 4 and 5 with different Firmwares and drivers. I also have several NC550/NC552 Emulex – causing no problems. HP only swapped 522 to new 522, but that ovisoulsy did not help. We are currently planning to rip NC522 and replace with NC550.

    Reply
  19. Sharif
    Sharif at |
    Reply
  20. Greg
    Greg at |

    We ave these joke’s of a card in our cluster as well. Have upgraded the Firmware and Drivers and still are having this issue. We are replacing out all the cards with the Intel nics, since HP will only replace the cards which we’ve already done. Dell is starting to look really good with HP’s clear lack of vision and support that has gone down the tubes in the past three years.

    Reply
  21. afokkema
    afokkema at |

    Same symptoms and troubles here with those NetXen adapters (NC375T). I am working with HP on this case. When I have some news about this case I will update it here.

    P.s. we are running the latest firmware and drivers:

    [[email protected] ~]# ethtool -i vmnic4
    driver: nx_nic
    version: 4.0.602
    firmware-version: 4.0.579
    bus-info: 0000:09:00.0

    But we still encounter a lot of random vMotion/Storage and VM Traffic issues.

    Reply
  22. Brian
    Brian at |

    I am already tired playing with HP and i got them replace all cards to intels (365T).
    by the way, the 523SFP seems does not have problem on the latest firmware.

    Reply
  23. lazyllama
    lazyllama at |

    We’ve been having the same problem with the NC375T cards.
    Every now and then one will cause the driver to log “Firmware hang detected.” and reset itself.
    We’re already running the recommended firmware and driver so those do not fix the issue (as of 6th March 2012).
    I’ve got an open case with HP to get a fix.

    # ethtool -i vmnic7
    driver: nx_nic
    version: 4.0.602
    firmware-version: 4.0.579
    bus-info: 0000:0b:00.3

    Reply
    1. arjunbalachandra
      arjunbalachandra at |

      Hey did you get a fix for this one ??

      I have a client running into same problems ./ I wanted to be certain it is a Hardware level issue.

      I have verified through the logs the message is same in regards firmware puking for no reason.

      Regards

      Arjun
      Vmware Inc.

      Reply
  24. predragc
    predragc at |

    Guys, can somebody help me and give me steps to update firmware on NC375i and NC375T cards which are installed in DL980.
    Operating system is ESXi 4.1 driver is latest,4.0.602, but how can I safely update firmare on both cards?

    Thanks in advance.

    Reply
  25. ollfried
    ollfried at |

    Just download latest SPP (2012-01) from http://www.hp.com/go/foundation and boot from that DVD.

    Reply
  26. KD
    KD at |

    I have several DL585 G7s (NC375i) running vSphere 4.1 that have experienced this issue over the last few months and this entry has been a great help. So I wanted to share that HP support contacted me to say that they are replacing the SPI boards for my 585s with a newer Rev board. The criteria is machines that have the “firmware hang detected” message in the vmkernel logs and they are running the 4.0.602 driver and 4.0.579 firmware. I was told there is a backlog on the boards and to expect delivery in 3-4 weeks.

    Reply
    1. ollfried
      ollfried at |

      Can you share some information, maybe case number? I have open cases, too…

      Reply
  27. cdunn
    cdunn at |

    I just wanted to add that we have 10 dl380 g7 with two nc523sfp cards in each. we have two links from each server plugged into two nexus switches. We are getting a single random dropped link on each server periodically. I’m at driver version 4.0.727 and firmware version 4.8.22. I just want to make people aware if they decide to buy this card. could still be a configuration, switch, card, or spf cable problem. Just odd that it happens on all of them at different times.

    Reply
    1. Mark Hodges
      Mark Hodges at |

      We were using disperse switches, cables and NIC’s and we were droppnig completely.
      After replacing the NIC’s (and using the intel SP’s) we have not dropped once…since Sept..)

      Reply
      1. ollfried
        ollfried at |

        You replaced the cards with the same type?

      2. Mark Hodges
        Mark Hodges at |

        nope..replaced them with the Intel x520 single port’s I believe…after that no more problems…unfortunately that was 11k worth of hardware replacements I had to do…and we are afraid to use the HP’s for anything else.
        Our main problem with the HP’s was pause frame flooding which would basically kill all traffic on the network and wouldn’t failover to the other

  28. 42
    42 at |

    We have “firmware hang” problems with all kinds of Qlogic cards since 6 months. The DL580G7 onboard NC375i, the quad port NC375T and 10ge NC523SFP. Now we should run a qlogic debugging tool on each ESXi server in the backgroud that takes a core dump in case of a hang… Not very promising

    Reply
  29. afokkema
    afokkema at |

    We also replaced the NetXen adapter with HP for the Intel 365T adapters. No problems since the replacement.

    Reply
  30. 42
    42 at |

    We’re now getting new Qlogic cards from HP, including new SPI board with the onboard NIC ports. Don’t know what I should think about that. But we don’t get cards with Intel chips.

    Reply
    1. ollfried
      ollfried at |

      We also get *new selected* QLogic 10Gb Cards. Wonder what will happen then…

      Reply
  31. HP NC375i Netzwerkadapter: Resetting the device because the device is not responding « layer9.

    […] interessant in diesem Zusammenhang ist der Artikel Identifying and Resolving NetXen nx_nic (Qlogic) NIC Failures von Chris Wahl. Gefällt mir:Gefällt mirSei der Erste, dem dieser […]

  32. cdunn
    cdunn at |

    hp is sending us new cards also but it looks like it will be the same card. I opened a ticket with vmware and they are leaning towards driver/firmware. Mean while i’m going to do some more testing and see if only certain configurations are causing this in our environment. There was a discusion on vmware communities about disabling the onboard nics and having only the 10gb ports be seen by esxi. i tried that and I had the same issue. i also seprated the traffic between 4 10gb links instead of just 2 thinking it could be mtu related and i had the same issue. I’m going to go to an explicit failover and remove iphash to see if that helps. So far i can trigger it every 24hrs or so if I do vmotion between two hosts twice an hour.

    Reply
    1. Mark Hodges
      Mark Hodges at |

      Right now we 2 single port intel 520 10g nic’s and 2 onboard 1g nics active and have not had the problem.
      With the 522 cards you are not able to disable one of the ports on the dual port card so you can’t have any of the 1g nic’s active (since 4-10g means you cannot have any 1g cards)

      We did try going with a single dual port 522 (so we were running 2-10G and 2-1G) and we still fell over with the 522’s…

      Reply
  33. Jonesy
    Jonesy at |

    cdunn,

    It sounds like you have the exact same setup that we have. The randomness of it all was what drove me crazy! We replaced the 523 cards with dual port x520s back in February and have not had any issues since then.

    42.
    Did you ever get anywhere with the firmware dumps?

    If it turns out that HP is replacing “faulty” nics, I might have to call and get the ones sitting in a box replaced with ones that work.

    Reply
  34. cdunn
    cdunn at |

    just to keep people updated. I’ve removed iphash and the portchannel. The vsphere 4.1 cluster im testing on i have had no issues so far its almost been 48hrs BUT i did make the change on our exchange ESXI cluster and I had a host fail with the Firmware Hang Detected so it doenst look like the config changes are working. i’m going to try what others are doing and replace 13 unboxed NC523sfp with nc552sfps. Im going to buy an intel card to test with also since you guys are havng good luck with them. Also on the esxi host for our exchange servers I didn disable the 4 1gb nics so Im going to try that on those servers while i’m waiting for the other cards. i noticed on the NC523sfp card that hp sent me to replace my other NC523sfp it had Rev: 0C on the sticker. the cards in the servers that are failing are of the same Rev 0C so I’m not sure its going to make a difference changing it. On the vmware thread I’m on other users were mentioning having Rev 0B. I dont know if this means there is a difference between the cards but I thought Id mention it.

    Reply
  35. nate
    nate at |

    Just a note – the HP NC523SFP has similar issues. I have a bunch of servers each have two of the cards and the cards would regularly fail. There was urgent firmware released late last year along with a bios change to increase the cooling but it doesn’t help (didn’t do anything for me). HP and Qlogic are trying to keep it quiet because there are a lot of cards to replace. It took me almost two months to get replacement cards. There apparently was a manufacturing issue with the original sets of cards and there is a new hardware revision that fixes it. For HP the spare part# is 593715-001 (what L2 told me when I told them a big package had arrived with that number on it – my servers are remote so I didn’t know what was in the box). You can’t get this part# if you just call HP and tell them to replace your NIC, you have to go through escalation and stuff. My new NICs shipped from somewhere deep in the innards of HP direct to me, they didn’t go through the field team for our 4 hour on site support.

    I had them replace the NICs in the servers every other day and at least so far it seems to have resolved the issues. The L2/L3 support folks at HP claim to be confident that this revision of hardware resolves the outstanding hardware issues on the 523SFP. I was impressed with how well the HP techs cabled up my systems. I mean I took great pains to label and stuff, but there are 11 cables on each of my DL385G7s, and at least for the 10GbE and FC (boot from SAN) – if you plug em in to the wrong ports bad things happen. But I verified each and every 10GbE port was correct after they replaced the NICs and they were every time. Same goes for FC (with the boot from SAN if the FC cables are swapped the card won’t see the LUN and won’t boot – not hard to fix just go to the bios and re-scan again and change the LUN, but still annoying).

    It’s a wide spread issue, on two different occasions I had full on network card failure – the cards would not pass traffic. I could get them working again for a few hours by rebooting, but then they would die again. VMware would sit there for hours resetting the card over and over. 3 cards in two servers in less than a week – in both cases – despite having 4 hour on site support there was no spare parts in the area. In one case I had to wait 5 days for new cards (and these were not the new cards these were just replacement old cards until the new cards arrived) because of miscommunication inside HP.

    If you see messages like this on boot up – replace your card too (if it wasn’t obvious) 🙂

    NIC boot code starting cmd peg times out! status 0xffffffff
    ql_update_adapt_cfg failed to init nic rc fffd for pci dev c00
    SetupOSCD failed to init nic rc fffd for pci dev c00
    Abort QLogic NIC boot process

    That first message I saw on one system several months ago, I didn’t know what it was for so I ran diagnostics at the time but didn’t get anything from them.

    Fortunately we designed our systems with two cards each and split our distributed virtual switches across them(active/passive – no load balancing), so when one card completely failed the other one took over pretty quick. Also all of our storage if good ‘ol fibre channel so that never skipped a beat. Also I put all of our service consoles on a separate redundant 1GbE network. I planned for the worst when I built it – I honestly didn’t think I’d ever need it.

    This is a well known issue inside HP, so you shouldn’t have trouble getting on the list to get new cards if you can provide the log files to show the issue your having – they also may have you provide a firmware dump (which can be painful because the firmware dump rendered my NICs inoperable until I rebooted) – also to get a firmware dump you have to wait until the problem recurs, you can’t just trigger it on demand. For me it took about 3 days until the NICs failed again on that particular server. I was expecting the backup NICs to take over but it seems the firmware dump process did something to them, or the drivers, the backup NICs didn’t work either and all VMs lost connectivity. Fortunately I was still able to manage the server with the 1GbE network.

    I feel for those folks that are relying on IP storage and just have a single NIC since these problems cause both ports on the NICs to fail simultaneously.

    hopefully this can help some of you out there. I haven’t posted on this topic on my own blog yet.

    There is another known issue on the NC522SFP (I believe) that I was made aware of recently – apparently with certain passive copper cables can trigger this NIC to flap it’s network ports (not the same behavior as the 523 issue). This is a known issue in the firmware (no fix yet) – and is hit or miss depending on what card you have(different cards with the same model# can behave differently). The only true workaround if you have this issue is to use optical cable instead of passive copper. Or maybe you can get lucky by changing the passive copper cable your using as it doesn’t happen on all cables (again – was told it was hit or miss, they haven’t found a sure workaround other than to use optical).

    Reply
  36. nate
    nate at |

    also – you can’t order new systems with the new cards. My HP VAR has been trying for the same two months to get new NICs for other servers that we bought last year(but are not using yet they are being shipped to europe soon), and so far has not had much luck.

    I’m sure that given the configuration is the same and it’s for the same customer(me) that it’s just a matter of time, but still it’s taking a while

    Reply
  37. 42
    42 at |

    @Jonesy: after uploading the firmware dumps to HP (about 8 of them) HP deceided to send us new NC375T cards and new SPI boards with NC375i. I’m still waiting for the cards to arrive. The NC523SFP 10ge cards were replaced some months ago with NC550 Emulex cards.

    Reply
    1. Jonesy
      Jonesy at |

      Thanks for the update! Did HP actually find anything with all those firmware dumps or are they just throwing different hardware at the problem?

      Reply
      1. 42
        42 at |

        I don’t know what HP or Qlogic did with the dumps. Shortly after we uploaded the dumps we received the message from HP that Qlogic confirmed the problem and that we’ll get cards with a revision of the board/chip. Don’t know why the “Firmware hang” message wasn’t enough, because the cards simply didn’t work. We had 2 HA failures because the ESXi host was not reachable for > 30 seconds.

      2. 42
        42 at |

        I received the new NC375T cards today. The revision number is 0G. We have ar least one old card that already has Rev. 0G the rest is 0F. So maybe we are lucky and 0G really fixes the problem.

  38. cdunn
    cdunn at |

    NC523sfp issues. I’ve been testing with x520-da2, nc552sfp and nc550sfp cards and I’ve had zero issues with either of the cards. I left a host in each of my vsphere clusters with the nc523sfp cards and they would fail every few days the other cards would stay up. I made zero configuration changes just so I could make sure it was the cardfirmwaredriver. I’ve been testing this for the last couple of weeks. We are going to purchase nc552sfp cards as are our solution to this issue. The 20+ nc523sfp cards will be resused in our windows servers. There is another guy over on the vmware forums doing similar testing and he’s had similar results.

    Reply
    1. Mark Hodges
      Mark Hodges at |

      So, the 522 and 523 cards have been failing in the vmware environments, but they do work properly without failure on Windows Servers?
      I would really like to use these 14x 522 cards if I was confident they wouldn’t fall over on us…
      Waiting on our HP rep to investigate replacing them….

      Reply
      1. cdunn
        cdunn at |

        We are testing it with our netbackup solution which is on a windows platform. I hate to have the nc 523sfp cards just sit around not being used so I’m hopeful i can. Based on the comments it sounds like that might not be the case. I was hoping the issue was more driver related. We are still doing testing so i dont know if they will work for us in window or not.

  39. Ken
    Ken at |

    I would be careful about using these cards in your Windows Servers, I have seen references on the Internet that this can affect Windows as well. People with large SQL servers were reporting the issue.

    We replaced ours with Intel cards about 3 months ago and not have not seen a single issue since then related to the network. I was not interested in being a Beta Tester for QLogic/HP.

    Reply
    1. cdunn
      cdunn at |

      Ken you are right after further testing we are having issues with the windows servers also. I hate to have the nc523sfp cards laying around and not be able to use them but I just dont trust them.

      Reply
  40. lazyllama
    lazyllama at |

    We have had two replacement (rev.G) NC375T cards sent to us by HP and they seem to have resolved the issue.

    We had 4 pre-rev.G cards which were causing us problems. We swapped all of the faulty cards out for NC364T cards we had in stock as soon as we were able as loss of network connectivity on ESX hosts using Ethernet-connected storage was causing outages and corruption.

    Unfortunately HP will only replace the other two cards if we put them back into servers and reproduce the error with associated logs etc. I’m not prepared to put customer services at very real known risk in order to prove each card is faulty so we will have to take the hit.

    Extremely unimpressed by that latter response from HP and quite glad that we’re switching away from them in the near future.

    Reply
  41. Jim
    Jim at |

    Am running into this problem despite updating firmware on the cards.. presumably I need to install the VMware driver *as well as* the firmware update? When the card “crashes” I can see in the vmkernel the firmware version of the card (4.0.579) but am running an older driver (4.0.550.1-1vmw) …

    Reply
  42. Arthur
    Arthur at |

    We have been experiencing issues with all qLogic NIC’s except the NC522SFP. Sure they had issues early on but have been very stable since the last firmware release in late 2011.
    Unfortunatly the NC375T’s and NC523SFP were so unstable they have been banished to the LAB and replaced with NC365T’s and NC552SFP’s respectively.
    Now to the NC375i, these have been simply horrible. HP refused to accept there were issues with them early on, qLogic released a firmware and driver update to address the precise issue that HP said didn’t exist. Since the last firmware and driver update the issue still persits. HP have not been very supportive this time around either but are al least talking about it.
    I hear there is a new SPI riser board available which uses updated hardware, apparently this resolves the issues with these NIC’s pausing.. They only seem to do this when pushed very hard.. I see utiliation of upto 600Mb/s then the NIC’s just stop transmitting and recieveing. All hell breaks loose, alarms start ringing and the phone calls start.

    I’m over it, I really need a solution which does not involve me throwing money at HP to replace something we have already purchased but doesn’t do the job it’s designed to do.

    Oddly enough Dell are aware of these issues and have offered assistance.. They are looking to get a foot in to the DataCentre per say.. They might get it yet.

    Simply Dell are offering to be part of the solution, HP seem content to be part of the problem.

    Reply
  43. Arthur
    Arthur at |

    Just a work for HP if you are reading.
    Everything can suffer problems, what defines the better party is the method they utilise to resolve these problems.

    Reply
  44. 42
    42 at |

    After all our NC375i and NC375T were replaced by cards with a new hardware revision we had no firmware hang problems for 5 weeks…. until today. One NC375i card had a firmware hang this morning. So it seems the new revision improves the situation but does not solve it 100%.

    Reply
  45. Arthur
    Arthur at |

    There is a new firmware/Driver package available. I’ve no idea if this resolves anything and I have no confidence in qLogics statements about it resolving anything.. Hey the last 2 firmware/driver packages have resolved the issue.. Obviously NOT.
    http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02964542
    This package is for NC375i, NC375T, NC522SFP nic’s

    After applying the package on a Dev server I now see.
    driver: nx_nic
    version: 5.0.619
    firmware-version: 4.0.588

    There was a catch to this.
    After applying the package via the VMWare update manager.
    The server rebooted as expected.
    When the server was back up the VC could not connect to it.
    Further investigation showed the NC375i nic’s were not initialised or connected, even though they were before applying the package.
    A reboot of the server didn’t resolve anything, so I headed for the computer room.
    I needed to in this case power the hardware down to disconnect the ILO as it was connected to my PC at another site which was not accessible to me from within the computer room. So I truely pulled the power on the server.
    Powered the server up again and ILO’ed to the server from my laptop within the Computer Room..
    To my surprise the NC375i nic’s were all up and working.
    It appears in at least some situations it maybe necessary to power the server off after applying this Firmware/Driver package.
    Dodgy Dogy stuff HP or more to the point QLogic.

    I hope this rather arduous process isn’t necessary on all servers but there’s my experience so far.
    Just to repeat
    Dodgy Dogy stuff HP or more to the point QLogic.

    Reply
    1. Oliver Antwerpen
      Oliver Antwerpen at |

      This is also true for Emulex CNA FW-Upgrades, but Power->Cold Boot via iLO does the job. You do nt need physical access to pull the plugs.
      Also, the driver/fw you are referring to is three month old and did not fix the problems.

      Reply
  46. Tim P
    Tim P at |

    Not sure how old this forum is but we have been experiencing issues with the NC522sfp / DL580 G7 servers for a year and a half. We finally have a resolution so I hope this helps everybody here. It turns out that there is a manufacturing issue with the Qlogic chipset that of course affects the NC522SFP 10Gbe cards as well as the onboard cards on a Dl580 G7 server. If you run the following on an ESX host, you will determine if the server is affected.:

    cat /var/log/vmkernel* | grep -i -e “firmware hang” -e “device is DOWN”

    What you have to do is open a case with HP and get the SPI board or the NC522 card replaced. HP replaced every single one of our cards and server SPI boards and we have lots of them…(20) DL580 G7’s running ESX and (20) NC522 cards. What a nightmare this has been but I am glad they have a solution. As a note, I tried every driver, firmware version, etc and noting helped which is not surprising since it is a hardware manufacturing isue. Knock on wood, all is good thusfar and yes it doesn’t look good on the resume when you have hosts going down and people are pointing the finger at you. By the way, the problem has lasted since early 2011 and we are in July of 2012. Good Luck!

    Reply
    1. Yucheng Liu
      Yucheng Liu at |

      Hi Tim, do you have the HP case number so we can reference to? we had the same firmware hang issue with our server. Thanks much!

      Reply
  47. Erik Briggs
    Erik Briggs at |

    9/24/2012 – I have been having the issue about once every 6 months (Dl585 G7) until about 2 weeks ago. It has happened 3 times since then. Over this past weekend, I updated to the firmware dated 9/4/12 (4.0.588) which says it specifically addresses this issue. I also updated to the latest drivers. Within less than a day, the server already hung the NIC before anyone hit the system yet (<1 hour of use).

    I opened a case today, and they are shipping me a replacement SPI board. The new firmware/drivers do NOTHING to help this issue, as it is a hardware problem.

    Reply
  48. Mark Hodges
    Mark Hodges at |

    We finally 12 months later and a discussion about moving to IBM have replacement 5222SFP Nic’s that the rep states is a known issue with the earlier revision cards.
    The old revision cards were 0B and the new revision are a 0H….I put 2 cards into an Windows server and never had a drop eventhough I happened them with 1.3TB of data transfer…
    Will put 2 into a new ESX 5 box at some point and see if they survive…

    Reply
  49. Peter C
    Peter C at |

    We’ve been chasing down this problem for nearly two months and narrowed in the NC375T network adapter as well. Most vDS connections have a redundant connection to another model NIC so we moved the problem uplink to standby. No combination of drivers and firmware fixes the problem. HP just sent out a replacement card which seems the same (though I did not check the hardware rev -next time I open the server I guess) , although the problem only occurs under load so time will tell. In the meantime we’re replacing one of the NC375T adapters per server with an Intel based card instead. Painful.

    Reply
  50. NetXen HP NC522SFP Network Flooding | Le cloud de Piermick
  51. Recommended BIOS Settings on HP ProLiant DL580 G7 for VMware vSphere | my repository synology
  52. sunny
    sunny at |

    We have been dealing witht his problem for approx 6 months now and after numerous firmware and driver updates we are at a cross roads. HP was willing to ship me out another 375T card with revision 0J. Can anyone confirm that a later revision fixes the actual issue and you don’t see anymore disconnects?

    Reply

Share your point of view!