Configure Apache logging to use source IP address behind load balancer

In this day and age load balancers are pretty common, whether it’s a service like Incapsula, Amazon ELB or anything else that proxies web traffic. One of the annoyances of proxies is that source address gets rewritten and if you want to know where your visitors are coming from then having the original source IP address helps.
Within Apache or most proxies for that matter they insert headers and the header we would like to take advantage of is “X-Forwarder-For”.

Now you could rewrite the Apache configuration to use the X-Forwarder-For header in place of the originating server’s (load balancer) IP address, the benefit of this is that Splunk will be happy with the builtin extractions. The downside is you should never trust headers as they can be manipulated client side and forged.

Here are two example from the vhost configuration file:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

The above example is from a default configuration

Now if you wanted to replace the load balancer’s IP address with the X-Forwarder-For IP addresses you “could” use:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

The above example dropped the load balancer’s IP address altogether and replaced it with the X-Forwarder-For address. This example is ultimately bad due to: dropping the true originating server altogether and now the Splunk inbuilt extractions will be broken if there is more that one address in the X-Forwarder-For list.

Perhaps what is a better idea is:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" srcip="%{X-Forwarded-For}i""

This example is perhaps the best solution because you keep the original information available and intact, you haven’t broken the Splunk extractions and you have helped your Splunk install (a little bit) by creating a key value pair for “srcip”.

And just to show off and get really complicated you could perform a bit of filtering and if you know that all requests will be X-Forwarded-For and tired of your load balancer health checks spamming your logs you could perform something like the following:

<VirtualHost *:443>
        ServerAdmin webmaster@localhost
        ServerName www.cammckenzie.com
        ErrorLog logs/ssl_error_log
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" ELBCheck
        LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" srcip="%{X-Forwarded-For}i"" XForwarderFor
        SetEnvIf X-Forwarded-For "^.*..*..*..*" forwarded
        CustomLog logs/access_log XForwarderFor env=forwarded
        CustomLog logs/elb_health_check_access_log ELBCheck env=!forwarded

Now this example basically puts the Amazon’s elastic load balancer’s health check in one file and genuine requests in another!