Issue A few HP Gen8 and Gen9 systems are crashing due to NMI. NMI's will be logged as Unrecoverable System Errors something like this: An Unrecoverable System Error has occurred (Error code 0x0000002D, 0x00000000 The first 32-bit error code can be decoded using this Workaround: # echo "blacklist hpwdt" >> /etc/modprobe.d/blacklist-hp.conf # update-initramfs -k all -u # update-grub # reboot Andy Whitcroft (apw) wrote on 2015-03-17: #3 Put together a generic solution which blacklists all And blacklisting hpwdt.ko will love the kernel panic. weblink
Unfortuantly these have not been kept in sync with the kernel leading to the module loading. """ This is actually not a resolution for this particular case, but a bug (from The kernal panic I see only happens while the VM is starting and CPU load sky rockets. At this point I had enough conclusive data to pass my findings to the hardware vendor for full collaboration on the problem. Edit Remove 18 This bug affects 3 people Affects Status Importance Assigned to Milestone linux (Ubuntu) Edit Fix Released High Andy Whitcroft Edit Ubuntu ubuntu-15.03 Precise Fix Released High Andy Whitcroft https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/An-Unrecoverable-System-Error-has-occurred-Error-code-0x0000002E/td-p/4318701
Watchdog-mux service is using this: Main PID: 1439 (watchdog-mux) CGroup: /system.slice/watchdog-mux.service └─1439 /usr/sbin/watchdog-mux Oct 21 09:25:10 pmx72 watchdog-mux: Watchdog driver 'HP iLO2+ HW Watchdog Timer', version 0Click to expand... We have DL 360 G6 (lates Bios patches) and a DL380 G( running in this lab. 'This are the versions we are running. We can start VM's, also migrate them but as soon you activate HA for any VM we receive a kernel panic on the hhwdt.ko module.
i tested this on HP proliant Servers, ILO+Watchdog on linux produces kernel panic,when you use HA on proxmox. We are an HP shop so I have plenty of brand new boxed 380 shells sitting in the warehouse I can test with. OA Forward Progress Log 4. Ilo Watchdog Nmi Have you tried checking system on minimum configuration?2.
I've got HP DL320e Gen8 v2 and Your solution works for me. An Unrecoverable System Error (nmi) Has Occurred Proliant Other distros run the watchdog timers just fine. Unfortunately we do not have the trace due to HP's dammed ILO :-( but I will give mor Info when catched it up. Checking in a Proliant Server: [email protected]:~# dmidecode | grep -i proliant Product Name: ProLiant DL360e Gen8 Family: ProLiant [email protected]:~# uname -a Linux hertz 3.16.0-31-generic #41-Ubuntu SMP Tue Feb 10 15:24:04 UTC
They continued investigating the issue. Ilo Application Watchdog Timeout Nmi Service Information 0x0000002b 0x00000000 Code blocks~~~ Code surrounded in tildes is easier to read ~~~ Links/URLs[Red Hat Customer Portal](https://access.redhat.com) Learn more Close Red Hat Customer Portal Skip to main content Main Navigation Products & Services Report a bug This report contains Public information Edit Everyone can see this information. VE 4.0 Kernel Panic on HP Proliant servers Discussion in 'Proxmox VE: Installation and configuration' started by mensinck, Oct 19, 2015.
So we engaged the hardware vendor who determined this error indicated an error on the PCI bus. https://access.redhat.com/solutions/1309033 Without the module the server reboot. An Unrecoverable System Error Nmi Has Occurred System Error Code 0x0000002b 0x00000000 Of course we shall recommend the HW watchdog interface for 2 node cluster setups, for example, when we can't rely on quorum policies and fencing mechanisms are not available (like external An Unrecoverable System Error (nmi) Has Occurred (service Information: 0x7fbce8f6, 0x00000000) This occur only on the HP server.
After replacing the shell the issue still persisted. have a peek at these guys tags: added: verification-doneremoved: verification-needed-precise verification-needed-trusty verification-needed-utopic Launchpad Janitor (janitor) wrote on 2015-04-08: #13 Download full text (5.9 KiB) This bug was fixed in the package linux - 3.13.0-49.81 --------------- linux (3.13.0-49.81) I would think this issue is for Canonical to investigate. I find it hard to believe this could be a hardware issue if there are so many of us seeing the issue. An Unrecoverable System Error Has Occurred Error Code 0x0000002d 0x00000000
The IML log is on the System Status page of the iLO web interface. If you go back to the 4.1 or 3.9 kernel on the HP does the issue go away? #13 adamb, Nov 11, 2015 pipomambo New Member Joined: Nov 11, 2015 early_idt_handlers+0x120/0x120 [ 5494.343686] [
hpwdt_pretimeout+0x8d/0xbc [hpwdt] [
Current Customers and Partners Log in for full access Log In New to Red Hat? This Issue is not a Proxmox VE one.Click to expand... Start here: http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?prodNameId=3279717〈=en&cc=us&taskId=135&prodClassId=-1&prodTypeId=15351&prodSeriesId=397646 Service Guide: http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c00218061/c00218061.pdf Run diags to isolate where the ASR is coming from. http://crearesiteweb.net/an-unrecoverable/an-unrecoverable-system-error-has-occurred-error-code-0x0000002d-0x00000000.html If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
No, create an account now. Need access to an account?If your company has an existing Red Hat account, your organization administrator can grant you access. It seems like if corosync wants to use them, which is why it would open /dev/watchdog, then there's either a corosync bug or there's something in the configuration that isn't right. RUN offline diagnostics via Smart Start CD.## Offline diagnostics will take more than hour depending on the loops and number of hard drives, This will check all the hardware components.
Next we looked at the manufacturer’s mechanismused tolog errors and found this piece of information - An Unrecoverable System Error has occurred (Error code 0x0000002D, 0x00000000) Note - each We Acted. Code blocks~~~ Code surrounded in tildes is easier to read ~~~ Links/URLs[Red Hat Customer Portal](https://access.redhat.com) Learn more Close NachoTech Blog Tech tidbits that have crunch! Rafael David Tinoco (inaddy) wrote on 2015-04-07: #12 Checked /lib/modprobe.d/blacklist_linux_* on Precise, Trusty, Utopic and Vivid and all of the contain hpwdt being blacklisted.
Learn More. You are right, I already found this also. iLO Event Log [ 5492.505988] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-123.9.2.el7.x86_64 #1 [ 5492.605615] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 [ 5492.692636] ffffffffa03ae2d8 17844fa82b224426 ffff880fffa06de0 My issue is resolved on the older kernels. #15 adamb, Nov 11, 2015 [email protected] Member Joined: Nov 12, 2015 Messages: 78 Likes Received: 0 Hello everybody!