Mellanox Technologies MT27500 [ConnectX-3 Flash Recovery] - How to recover a broken firmware update
I recently had a fun task of trying to recover a Mellanox 10 GbE Network card that was dead from a failed firmware upgrade in the past. It was quite difficult to recover, due to lack of hardware information but with some educated guesses I managed to successfully write the correct firmware back on.
I’ll try run through the issue and fix. Sorry in advance the step by step will be lacking the exact outputs as I’m writing this post fix.
The problem: The server was not showing me the expected network interface during installation. I knew the card was installed but upon attempting to PXE boot, it just tried the other two on-board adapters and then moved on.
Next step for me was to boot a live cd and run: lspci -nn which output:
07:00.0 Memory controller: Mellanox Technologies MT27500 [ConnectX-3 Flash Recovery]
Ok that’s something at least, it seemed to be stuck in a recovery mode - all hope is not lost!
Next step for me was to download ‘mlxup’ from (http://www.mellanox.com/page/mlxup_firmware_tool) which was a standalone binary and could be run on the live cd…
The output from that command I don’t have, but I believe it may have had the ‘Device Type’ listed, everything else was blank and ’status’ had an error that I can’t recall.
At this point I needed to write a new firmware (It can’t get any worse or more broken than it is at present)
First step was to install Fedora on one of the disks, so I could install the Mellanox Firmware tools (http://www.mellanox.com/page/management_tools), with Fedora and firmware tools now installed it was go time!
Next is the summary of commands performed to write new firmware:
# Create the /dev/ devices for the firmware tools to use mst start # Output the status mst status # My device was listed as /dev/mst/mt502_pciconf0 with no assicoiated pci_cr0 device # Next I needed to download the correct firmware, this is where you may become unstuck as I wasn't 100% of the Product 'Line', let alone the 'OPN' or 'PSID'! Luckily for me I had an identical working server that I could reference. On that server I ran 'mlxup' which gave me the details I required! (At an educated guess). # Download the firmware wget http://www.mellanox.com/downloads/firmware/fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin.zip # Unzip it unzip fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin.zip # Try and burn the firmware flint -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn # Card is broken, therefore disable safety checks (not sure why I used --use_fw) flint -nofs --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn # It warned that it would be updated using blank MAC addresses, So I generated some MAC addresses flint -nofs --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn # It warned me about something else, so I used the -use_image_ps flag flint -nofs -use_image_ps --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn # Hooray success! But I noticed that the guid and system guid are blank - I'm not sure what impact that would have but I generated some anyway... flint -nofs -use_image_ps --guid beeff45214810ae0 --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn # Yay double success!
So I rebooted the server and lo and behold it tried to PXE boot from the card! Booting into the Fedora installation showed my recovered network interface with it’s generated MAC address, DHCP that interface and BOOM! Interwebs!!!!
Hopefully this helps someone!
Filter Ansible logs on CentOS 7
CentOS 7 and Redhat 7 seem to love shoving logs into /var/log/messages and if you run ansible in-house then it generates a lot of noise and if you run Splunk or ELK then getting your logs ‘just right’ is important to O.C.D type levels!
To send your ansible generated logs to /var/log/ansible.log perform the following:
Create the file /etc/rsyslog.d/ansible.conf
if ( $programname contains "ansible" ) then /var/log/ansible.log & stop
and if you are creating log files you will want to manage those log files too!
Create the file /etc/logrotate.d/ansible
/var/log/ansible.log { notifempty weekly rotate 4 missingok compress }
And restart / reload rsyslog
service rsyslog restart
Using OpenSCAP to scan and harden your servers
Determine which profile you want to use: oscap info /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml replacing the –profile line as required.
Perform a scan:
oscap xccdf eval --report report.html --profile xccdf_org.ssgproject.content_profile_CS2 /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml
Apply a remediation:
oscap xccdf eval --remediate --report report.html --profile xccdf_org.ssgproject.content_profile_CS2 /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml
Key_Events_On_Hosts - Splunk EventType
Once upon a time Splunk had an “EventType” named “Key_Events_On_Hosts” then one day it disappeared. It used to power some reports that I use. So I had to go find it from a backup. Here is the EventType is all it’s naked glory.
( sourcetype="WinEventLog*" OR sourcetype="XmlWinEventLog*" (Type="*Error*" OR Type="*Fail*" OR EventCode=1074 OR EventCode=19 OR EventCode=20 OR EventCode=21 OR Eventcode=1001) ) OR ( sourcetype="WindowsUpdateLog" (status="installed" OR status="failure" OR status="restart required") )
Configure Centrify Express with Apache’s mod_auth_kerb
I was lucky enough to spend the morning trying to get mod_auth_kerb working with our existing installation of Centrify without creating any additional SPNs.
It was actually very straight forward except for the missing component of the secret sauce, that’s not documented in many places.
Basically to get it to work perform the following on RedHat 6 (and CentOS 6):
yum install httpd yum install mod_auth_kerb vim /etc/httpd/conf.d/auth_kerb.conf # # The mod_auth_kerb module implements Kerberos authentication over # HTTP, following the "Negotiate" protocol. # LoadModule auth_kerb_module modules/mod_auth_kerb.so # # Sample configuration: Kerberos authentication must only be # used over SSL to prevent replay attacks. The keytab file # configured must be readable only by the "apache" user, and # must contain service keys for "HTTP/www.example.com", where # "www.example.com" is the FQDN of this server. # <Location /private> SSLRequireSSL AuthType Kerberos AuthName "Kerberos Login" KrbMethodNegotiate On KrbMethodK5Passwd On KrbAuthRealms YOURDOMAIN.COM Krb5KeyTab /etc/krb5.keytab # KrbServiceName is the Centrify secret sauce KrbServiceName http require valid-user </Location> chown root:apache /etc/krb5.keytab chmod 640 /etc/krb5.keytab
And that’s it. Hopefully “KrbServiceName http” was the secret sauce you needed!