Mellanox Technologies MT27500 [ConnectX-3 Flash Recovery] - How to recover a broken firmware update

I recently had a fun task of trying to recover a Mellanox 10 GbE Network card that was dead from a failed firmware upgrade in the past. It was quite difficult to recover, due to lack of hardware information but with some educated guesses I managed to successfully write the correct firmware back on.

I’ll try run through the issue and fix. Sorry in advance the step by step will be lacking the exact outputs as I’m writing this post fix.

The problem: The server was not showing me the expected network interface during installation. I knew the card was installed but upon attempting to PXE boot, it just tried the other two on-board adapters and then moved on.

Next step for me was to boot a live cd and run: lspci -nn which output:
07:00.0 Memory controller: Mellanox Technologies MT27500 [ConnectX-3 Flash Recovery]

Ok that’s something at least, it seemed to be stuck in a recovery mode - all hope is not lost!

Next step for me was to download ‘mlxup’ from (http://www.mellanox.com/page/mlxup_firmware_tool) which was a standalone binary and could be run on the live cd…

The output from that command I don’t have, but I believe it may have had the ‘Device Type’ listed, everything else was blank and ’status’ had an error that I can’t recall.

At this point I needed to write a new firmware (It can’t get any worse or more broken than it is at present)

First step was to install Fedora on one of the disks, so I could install the Mellanox Firmware tools (http://www.mellanox.com/page/management_tools), with Fedora and firmware tools now installed it was go time!

Next is the summary of commands performed to write new firmware:

# Create the /dev/ devices for the firmware tools to use 
mst start

# Output the status 
mst status
# My device was listed as /dev/mst/mt502_pciconf0 with no assicoiated pci_cr0 device 

# Next I needed to download the correct firmware, this is where you may become unstuck as I wasn't 100% of the Product 'Line', let alone the 'OPN' or 'PSID'! Luckily for me I had an identical working server that I could reference. On that server I ran 'mlxup' which gave me the details I required! (At an educated guess).

# Download the firmware
wget http://www.mellanox.com/downloads/firmware/fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin.zip

# Unzip it
unzip fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin.zip

# Try and burn the firmware
flint -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn

# Card is broken, therefore disable safety checks (not sure why I used --use_fw)
flint -nofs --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn 

# It warned that it would be updated using blank MAC addresses, So I generated some MAC addresses
flint -nofs --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn 

# It warned me about something else, so I used the -use_image_ps flag
flint -nofs -use_image_ps --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn 

# Hooray success! But I noticed that the guid and system guid are blank - I'm not sure what impact that would have but I generated some anyway...
flint -nofs -use_image_ps --guid beeff45214810ae0 --mac f45214810ae0 --use_fw -d /dev/mst/mt502_pciconf0 -i fw-ConnectX3-rel-2_40_7000-MCX311A-XCA_Ax-FlexBoot-3.4.746.bin burn

# Yay double success!

So I rebooted the server and lo and behold it tried to PXE boot from the card! Booting into the Fedora installation showed my recovered network interface with it’s generated MAC address, DHCP that interface and BOOM! Interwebs!!!!

Hopefully this helps someone!