IT Dribble

Mutterings, inconsistant tips, rants and randomness

Mellanox Technologies MT27500 [ConnectX-3 Flash Recovery] - How to recover a broken firmware update

by

I recently had a fun task of trying to recover a Mellanox 10 GbE Network card that was dead from a failed firmware upgrade in the past. It was quite difficult to recover, due to lack of hardware information but with some educated guesses I managed to successfully write the correct firmware back on.

I’ll try run through the issue and fix. Sorry in advance the step by step will be lacking the exact outputs as I’m writing this post fix.

The problem: The server was not showing me the expected network interface during installation. I knew the card was installed but upon attempting to PXE boot, it just tried the other two on-board adapters and then moved on.

Next step for me was to boot a live cd and run: lspci -nn which output:
07:00.0 Memory controller: Mellanox Technologies MT27500 [ConnectX-3 Flash Recovery]

Ok that’s something at least, it seemed to be stuck in a recovery mode - all hope is not lost!

Next step for me was to download ‘mlxup’ from (http://www.mellanox.com/page/mlxup_firmware_tool) which was a standalone binary and could be run on the live cd…

The output from that command I don’t have, but I believe it may have had the ‘Device Type’ listed, everything else was blank and ’status’ had an error that I can’t recall.

At this point I needed to write a new firmware (It can’t get any worse or more broken than it is at present)

First step was to install Fedora on one of the disks, so I could install the Mellanox Firmware tools (http://www.mellanox.com/page/management_tools), with Fedora and firmware tools now installed it was go time!

Next is the summary of commands performed to write new firmware:

# Create the /dev/ devices for the firmware tools to use 
mst start

# Output the status 
mst status
# My device was listed as /dev/mst/mt502_pciconf0 with no assicoiated pci_cr0 device 

# Next I needed to download the correct firmware, this is where you may become unstuck as I wasn't 100% of the Product 'Line', let alone the 'OPN' or 'PSID'! Luckily for me I had an identical working server that I could reference. On that server I ran 'mlxup' which gave me the details I required! (At an educated guess).

# Download the firmware
wget http://www.mellanox.com/downloads/firmware/fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin.zip

# Unzip it
unzip fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin.zip

# Try and burn the firmware
flint -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn

# Card is broken, therefore disable safety checks (not sure why I used --use_fw)
flint -nofs --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn 

# It warned that it would be updated using blank MAC addresses, So I generated some MAC addresses
flint -nofs --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn 

# It warned me about something else, so I used the -use_image_ps flag
flint -nofs -use_image_ps --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn 

# Hooray success! But I noticed that the guid and system guid are blank - I'm not sure what impact that would have but I generated some anyway...
flint -nofs -use_image_ps --guid beeff45214810ae0 --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn

# Yay double success!

So I rebooted the server and lo and behold it tried to PXE boot from the card! Booting into the Fedora installation showed my recovered network interface with it’s generated MAC address, DHCP that interface and BOOM! Interwebs!!!!

Hopefully this helps someone!

Filter Ansible logs on CentOS 7

by

CentOS 7 and Redhat 7 seem to love shoving logs into /var/log/messages and if you run ansible in-house then it generates a lot of noise and if you run Splunk or ELK then getting your logs ‘just right’ is important to O.C.D type levels!

To send your ansible generated logs to /var/log/ansible.log perform the following:

Create the file /etc/rsyslog.d/ansible.conf

if ( $programname contains "ansible" ) then /var/log/ansible.log
& stop

and if you are creating log files you will want to manage those log files too!
Create the file /etc/logrotate.d/ansible

/var/log/ansible.log {
        notifempty
        weekly
        rotate 4
        missingok
        compress
    }

And restart / reload rsyslog

service rsyslog restart

Using OpenSCAP to scan and harden your servers

by

Determine which profile you want to use: oscap info /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml replacing the –profile line as required.

Perform a scan:

  oscap xccdf eval --report report.html 
      --profile xccdf_org.ssgproject.content_profile_CS2 
       /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml 

Apply a remediation:

  oscap xccdf eval --remediate --report report.html 
      --profile xccdf_org.ssgproject.content_profile_CS2 
       /usr/share/xml/scap/ssg/content/ssg-rhel7-ds.xml 

Configure Centrify Express with Apache’s mod_auth_kerb

by

I was lucky enough to spend the morning trying to get mod_auth_kerb working with our existing installation of Centrify without creating any additional SPNs.

It was actually very straight forward except for the missing component of the secret sauce, that’s not documented in many places.
Basically to get it to work perform the following on RedHat 6 (and CentOS 6):

yum install httpd
yum install mod_auth_kerb

vim /etc/httpd/conf.d/auth_kerb.conf

#
# The mod_auth_kerb module implements Kerberos authentication over
# HTTP, following the "Negotiate" protocol.
#

LoadModule auth_kerb_module modules/mod_auth_kerb.so

#
# Sample configuration: Kerberos authentication must only be
# used over SSL to prevent replay attacks.  The keytab file
# configured must be readable only by the "apache" user, and
# must contain service keys for "HTTP/www.example.com", where
# "www.example.com" is the FQDN of this server.
#

<Location /private>
  SSLRequireSSL
  AuthType Kerberos
  AuthName "Kerberos Login"
  KrbMethodNegotiate On
  KrbMethodK5Passwd On
  KrbAuthRealms YOURDOMAIN.COM
  Krb5KeyTab /etc/krb5.keytab
# KrbServiceName is the Centrify secret sauce
  KrbServiceName http
  require valid-user
</Location>

chown root:apache /etc/krb5.keytab
chmod 640 /etc/krb5.keytab

And that’s it. Hopefully “KrbServiceName http” was the secret sauce you needed!

Configure Apache logging to use source IP address behind load balancer

by

In this day and age load balancers are pretty common, whether it’s a service like Incapsula, Amazon ELB or anything else that proxies web traffic. One of the annoyances of proxies is that source address gets rewritten and if you want to know where your visitors are coming from then having the original source IP address helps.
Within Apache or most proxies for that matter they insert headers and the header we would like to take advantage of is “X-Forwarder-For”.

Now you could rewrite the Apache configuration to use the X-Forwarder-For header in place of the originating server’s (load balancer) IP address, the benefit of this is that Splunk will be happy with the builtin extractions. The downside is you should never trust headers as they can be manipulated client side and forged.

Here are two example from the vhost configuration file:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

The above example is from a default configuration

Now if you wanted to replace the load balancer’s IP address with the X-Forwarder-For IP addresses you “could” use:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

The above example dropped the load balancer’s IP address altogether and replaced it with the X-Forwarder-For address. This example is ultimately bad due to: dropping the true originating server altogether and now the Splunk inbuilt extractions will be broken if there is more that one address in the X-Forwarder-For list.

Perhaps what is a better idea is:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" srcip="%{X-Forwarded-For}i""

This example is perhaps the best solution because you keep the original information available and intact, you haven’t broken the Splunk extractions and you have helped your Splunk install (a little bit) by creating a key value pair for “srcip”.

And just to show off and get really complicated you could perform a bit of filtering and if you know that all requests will be X-Forwarded-For and tired of your load balancer health checks spamming your logs you could perform something like the following:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" ELBCheck
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" srcip="%{X-Forwarded-For}i"" XForwarderFor
        SetEnvIf X-Forwarded-For "^.*..*..*..*" forwarded
        CustomLog logs/access_log XForwarderFor env=forwarded
        CustomLog logs/elb_health_check_access_log ELBCheck env=!forwarded

Now this example basically puts the Amazon’s elastic load balancer’s health check in one file and genuine requests in another!