The only errors that i found was the above in OnBoard Administrator --> IML Log System error ---> An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000) ASR If it is less than 1.72, the controller will not notify about the firmware update requirement... My last resort is to copy everything to a new server, and reinstall this one, which I’d like to skip. Thanks, Edgar Santos0 0 12/27/13--12:59: Re: Performance Current Customers and Partners Log in for full access Log In New to Red Hat? http://crearesiteweb.net/an-unrecoverable/an-unrecoverable-system-error-has-occurred-error-code-0x0000002e-0x00000000.html
Maybe they are a ilo timeout configuration somewhere in ilo ? #18 aderumier, Nov 16, 2015 aderumier Member Joined: May 14, 2013 Messages: 58 Likes Received: 0 I also found Please test the kernel and update this bug with the results. Can I use two different types of hard drives/hdd ( scsi and sas ) for Server HP Proliant DL 380 G5 and DL165 G6 ? Whether it will cause This occur only on the HP server.
It was almost every 15 minutes then that the server crashed.This is why I asked for the PCI riser cage. https://bugs.launchpad.net/bugs/1417580 Title: HP Proliant Servers should use proper cmdline to avoid kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1417580/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@xxxxxxxxxxxxxxxx https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs Next Message by Since we have no debug symbols for the kernel (I did not find any package about this....), I could not use kdump to catch the panic up. Open Source Communities Subscriptions Downloads Support Cases Account Back Log In Register Red Hat Account Number: Account Details Newsletter and Contact Preferences User Management Account Maintenance My Profile Notifications Help Log
This occur only on the HP server. Only this HBA 81Q:QLogic PCI to Fibre Channel Host Adapter for HPAK344A:Host Device Name vmhba3BIOS version 3.13FCODE version N/AEFI version 6.23Flash FW version 5.09.00Is there any resolution? P420 is overheating.3. Ilo Watchdog Nmi First of all I have checked the thread at HP forum.
repair_env_string+0x5c/0x5c [ 5494.262390] [
Brad Figg (brad-figg) on 2015-03-18 Changed in linux (Ubuntu Utopic): status: In Progress → Fix Committed Changed in linux (Ubuntu Trusty): status: In Progress → Fix Committed Changed in linux (Ubuntu https://bugs.launchpad.net/bugs/1432837 so you should have a write cache for the backup logical drive. (parity generation and extra writes for raid 5) for raid 0 a write/read cache for ssds does not An Unrecoverable System Error (nmi) Has Occurred (system Error Code 0x0000002b 0x00000000) I can't even update any of the firmware yet as it's not finding the particular part number on the Intelligent Provisioning system (I had to manually update both the servers firmware, An Unrecoverable System Error Has Occurred Error Code 0x0000002d 0x00000000 This seems to be a kernel/driver/firmware/platform issue that prevented the watchdog NMI from being reported in customer friendly terms.
Maybe they are related but they sound a bit different. http://crearesiteweb.net/an-unrecoverable/an-unrecoverable-system-error-has-occurred-error-code.html Main Menu LQ Calendar LQ Rules LQ Sitemap Site FAQ View New Posts View Latest Posts Zero Reply Threads LQ Wiki Most Wanted Jeremy's Blog Report LQ Bug Syndicate Latest This probably falls on HP first. Red Hat Account Number: Red Hat Account Account Details Newsletter and Contact Preferences User Management Account Maintenance Customer Portal My Profile Notifications Help For your security, if you’re on a public An Unrecoverable System Error (nmi) Has Occurred (service Information: 0x7fbce8f6, 0x00000000)
this is something I'll pursue. Still worth trying the older 4.1 or 3.9 kernels. Contact us about this article We have problem with installing windows 2012 to disk with size over 5Tb. http://crearesiteweb.net/an-unrecoverable/an-unrecoverable-system-error-has-occurred-error-code-0x0000002d-0x00000000.html This is great, but the error messages logged are not very user friendly.
Furthermore the HP System Management Homepage do not show any errors or warnings. Uncorrectable Pci Express Error I have taken it out now, and up to now, there was no further issue.I have to monitor it for a while to see if that actually stopped it. atomic_notifier_call_chain+0x1a/0x20 [
NMI's will be logged as Unrecoverable System Errors something like this: An Unrecoverable System Error has occurred (Error code 0x0000002D, 0x00000000 The first 32-bit error code can be decoded using this
HP is one of the most active members in the ACPI specification group and several features for their servers, available through their firmware, are heavily ACPI dependent. As described in /etc/modprobe.d/blacklist-watchdog.conf: """ # Watchdog drivers should not be loaded automatically, but only if a # watchdog daemon is installed. """ We should blacklist module "hpwdt" by default for Reason: Added link to the HP forum Ser Olmy View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by Ser Olmy 06-02-2014, 06:33 AM Uncorrectable Pci Express Error Dl380p Gen8 Please visit this page to clear all LQ-related cookies.
It was a mess! 0 Kudos Reply aperson Occasional Visitor Options Mark as New Bookmark Subscribe Subscribe to RSS Feed Highlight Print Email to a Friend Report Inappropriate Content 08-12-2014 09:13 The IML log is on the System Status page of the iLO web interface. Anyone can find instructions on how to run it here: https://github.com/inaddy/notifymydog Small Example: inaddy@host:~$ wget https://raw.githubusercontent.com/inaddy/notifymydog/master/notifymydog.c inaddy@host:~/notifymydog$ gcc -Wall -D_DEBUG=0 -D_SYSLOG=1 notifymydog.c -o notifymydog inaddy@host:~/notifymydog$ sudo ./notifymydog & inaddy@host:~$ sudo tail this content Edward Bustos (edward-bustos) wrote on 2015-03-18: #5 Per Dan Zink (HP FW/BIOS): I agree with Linda.
Per kernel team comments (on kernel-team mailing list): """ We have been seeing random crashs from various HP systems, this has been tracked to loading of the hpwdt watchdog modules. Data will automatically be written to drive array.Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1719 - A controller failure event occurred prior to this power-upCache module could have In iLO log you will probably find NMI exception with end of error code 2B. notify_die+0x2e/0x30 [
And if you look at the QuickSpecs of the Smart Array E200 controller ( http://h18004.www1.hp.com/products/quickspecs/productbulletin.html#!spectype=worldwide&type=html&docid=12460 ), you'll find that exact part number among the list of supported disks. Yes, the Launchpad Janitor (janitor) wrote on 2015-03-24: #7 This bug was fixed in the package linux - 3.19.0-10.10 --------------- linux (3.19.0-10.10) vivid; urgency=low [ Andy Whitcroft ] * [Packaging] control -- make Data will automatically be written to drive array.Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1719 - A controller failure event occurred prior to this power-upCritical PCI Bus 03/13/2013 If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
There are 2 logical drivers, drive C: (OS, system, and programs) and drive F: (LOB programs, users shared data). Although there are 5 PC's in the office there are only start_secondary+0x2ac/0x2ef IML(Integrated Management Log) logs are as follows: An An Unrecoverable System Error (NMI) has occurred (System error code 0x00000000, 0x00000000)Unrecoverable System Error (NMI) has occurred (System error code 0x00000000, 0x00000000) They even use the same firmware updates. I seem to recall that someone in another thread mentioned a similar situation: when new HDD models become available, the QuickSpecs of the Showing results for Search instead for Do you mean Menu Categories Solutions IT Transformation Internet of Things Topics Big Data Cloud Security Infrastructure Strategy and Technology Products Cloud Integrated Systems Networking
Deactivating the module and so falling back to the softdog would help. I find it hard to believe this could be a hardware issue if there are so many of us seeing the issue. I ran HP diagnostic tools and all seem normal. Just to provide feedback on the cmdline and its explanations.
With the module hpwdt loaded, a kernel panic happens randomly. i have to check the PSP version in 100 servers right now0 0 12/27/13--07:16: Re: ML370G6 & predictive failure warnings Contact us about this article Thanks for the suggestions.