Unless otherwise noted, articles © 2005-2008 Doug Spencer, SecurityBulletins.com. Linking to articles is welcomed. Articles on this site are general information and are NOT GUARANTEED to work for your specific needs. I offer paid professional consulting services and will be happy to develop custom solutions for your specific needs. View the consulting page for more information.


Using web server logs to track exploits and fraud

From SecurityBulletins.com

Jump to: navigation, search

Written by Doug Spencer (c) 2006 - Written November 10, 2006

On several occasions, I've been called upon to track down the specific information in a web server access log to find the IP address responsible for a fraudulent transaction, sometimes out of millions of transactions. This article will present some techniques that can be used to track exploits and fraudulent transactions on a web site. The examples shown are in the UNIX and Linux environments. Other operating systems can use the concepts but the specfics will be different.

On a busy site, it helps greatly to get your servers configured to allow easier correlation of events. This means using Network Time Protocol (NTP) to set the clocks on all your systems to an accurate time source. Computer clocks aren't very accurate and will drift over time, so NTP is critical. If you are lucky enough to have an accurate time synchronization throughout the web server, application server, and database, it makes tracking issues easier.

If your application stores the remote IP address for each transaction, it is even easier. In that case, you can grep for the IP address in your logs and do some checking to verify the address wasn't spoofed. On many of my tasks at government agencies, the applications did not store the IP address in the database due to Privacy Act concerns.

Basically, during this process you will be filtering out normal transactions. When you are done sifting out the normal transactions, you will likely be left with a farily small set of transactions to investigate.

The first filter can be by the approximate time of the transaction. You can eliminate the majority of legitimate transactions by filtering the time and date. For the purposes of this article, we'll say we know there was a fraudulent transaction that came from a web access on the 9th of November 2006 at around 1:30PM local time. Using that information, we filter for transactions as follows:

grep 09.Nov.2006:13:..:.. access.log > subset.txt

We get a subset list of transactions that happened from 13:00-13:59. This gives you a starting point to keep eliminating legitimate transactions and focus on suspicious transactions. Also, you can more eeasily get subsets from this smaller set of records. For instance, if you have a Netscaler, Cisco CSM, BigIP or other load balancer, you could easily eliminate the watchdog connections they do to verify your site is up.

You can also further break down the time intervals. If you just wanted any transactions from 13:20-13:40, you could do the following:

grep 09.Nov.2006:13:[234].:.. subset.txt > subset1320_1340.txt

Next, look for unique characteristics of any particular transaction. For instance, you might try the following to get the number of times a particular IP address accessed the site. You will want to look for users with unusual usage patterns. For instance, some automated bots will bypass most of the functions of a site and just go to the page or pages they want to manipulate. Those might show up with fewer than the normal number of accesses to the page. Other exploits may show up with an excessive amount of requests. A lot of this depends on the particular site.

awk '{print $1}' access.log | sort | uniq -c | sort -n

The first column in the results of the command shown above will be the number of times that IP address accessed the site, the second will be the IP address itself.

Note that some larger ISPs have multiple proxies that spread out web requests, so just because a request looks strange doesn't necessarily make it so. Insight and thought is required in the analysis of any security issue.

Now that we have a small set of addresses that we want to investigate, we can begin looking at the specifics. Look for what was requested from the site. If your site uses GET requests, you will generally see the information passed to the site in the logs. A POST request will usually only show the web page where the data was sent.

In our example case, I was looking for someone with unusual access to a PHPbb site. I eliminated the sites that were obviously not correct, got the list down to three IP addresses that had unusual access patterns. Two of those had requests for many previous pages. One IP address showed the following:

xxx.xxx.xx.xx - - [09/Nov/2006:13:39:22 -0800] "GET /phpbb/profile.php?mode=register&agreed=true HTTP/1.0" 301 277 
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
xxx.xxx.xx.xx - - [09/Nov/2006:14:15:26 -0800] "GET /cgi-bin/mt/mt-comments.cgi?entry_id=984 HTTP/1.0" 404 22550 
"http://debt-consolidation.zwitech.com/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; N_o_k_i_a)"

There was no other access to the site from that IP address. The spam URL in the referrer and the targeted access to the site was an obvious indication that the IP was not a legitimate user of the site. Even without that indication, normal users have an average of 4-6 requests per page on this site, which included some graphics, each of which will produce a log entry.

Note that just because an IP address shows up in the logs doesn't mean you have caught any particular person. That requires more investigative work to track down a specific person. When searching for the address responsible for an exploit or fraudulent transaction, you should take special effort to preserve the evidence in its original state. This means archiving logs to a place where they won't be modified, using the UNIX "script" command to save the steps you took while investigating, and NOT deleting any information until it is safely backed up in a way that shows a chain of custody. Preserving your evidence and the propriety of that evidence is critical if you wish to recover damages in a legal setting.

Personal tools