This is the second part of a series of articles focused on creating tuple.co, which is a web security data gathering and analysis experiment conducted by the author. The first part covers the introduction and .htaccess setup on Apache, the second part covers scripting, and the last part covers analysis. If you would like to find out more about this, please visit tuple.co and fill in the inquiry form, or comment on this blog.
Script Setup
In the last article, we covered the objectives – security, data gathering, and analysis, and we covered a simple setup of .htaccess for Apache server. We are covering Apache in this article series and do not intend to cover other web servers, as some readers have inquired directly. In this article we will cover the scripting involved in order to gather information about users that generate suspicious HTTP error codes, and block those users from the site to prevent a repeat incident. For Tuple.co, we enabled server-side includes and chose Perl for the CGI scripting language. The reason for selecting Perl is that the author has used the scripting language on occasion and there are many available script snippets and support available throughout the web. Here is the first script to cover and the first script that all invalid HTTP requests encounter based upon the .htaccess directive, which was covered in the previous article.
#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use Geo::IPfree;
use strict;
use Socket;
use FileHandle;
use File::Copy;
use Fcntl qw(:flock);
use Fcntl qw(:flock SEEK_END);
use DBI;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Headers;
use Text::CSV_XS;
use XML::Simple;
my($remip) =$ENV{REMOTE_ADDR};
my($remport) = $ENV{REMOTE_PORT};
my($ua) = $ENV{HTTP_USER_AGENT};
my($compname) = $ENV{REMOTE_HOST};
my($username) = $ENV{REMOTE_USER};
my($refer) = $ENV{HTTP_REFERER};
my($reqmethod) = $ENV{REQUEST_METHOD};
my($requri) = $ENV{REQUEST_URI};
my(@remiparr);
This part is straightforward declaration, telling Perl which modules to use, and declaring variables to be used throughout the script. Notice already how much information that you can gather on users just by using and implementing LWP::UserAgent and variables assignment, such as my($remip)=$ENV{REMOTE_ADDR}. All of these variable assignments provide for quick access to some basic user information that we can use to gather even more data (geolocation, provider, and other information) and analyze later. The reader must also understand that we are not gathering this information on all visitors to the web site, but users that generated an error code when accessing the web service or web application. The next part of the script handles search robot requests for resources that may not exist, (e.g., deleted pages) and prints friendly HTML for the robots and warning HTML for users that may have purposely generated an error code. We will skip this section of the code and move onto the code that updates the .htaccess file and blocks the user from the site by IP address.
Blocking
The ‘main’ function in the script is the writeHTML() function as commented in the code, and toward the bottom of that function, there are calls to some other functions:
...
compare_ip();
block_ip();
get_blocked();
...
These functions are listed below, and the names assigned to them describe the purpose of the function; namely, compare_ip() compare’s the current user IP address with the IP addresses already listed in the .htaccess file, so we do not write the same IP address multiple times to the same file. The block_ip() function writes the IP address to the .htaccess file as a ‘deny from’ directive so the user cannot repeat the activity that created the error. The get_blocked() function retrieves a list of currently blocked ip addresses from the .htaccess file, which in the case of Tuple.co is used to print on the front page of the web site. This function is optional and may be useful for other operations.
############## See if already blocked, then write IP detail ####
sub compare_ip(){
@remiparr = $remip =~ m/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/;
open(my($htfile), "../.htaccess"); #Read
lock($htfile);
while(){
if(/^deny from/i){
my($ts) = substr($_, 10, length($_));
#print "$ts";
my @tsarr = $ts =~ m/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/;
#print "$remiparr[0], $tsarr[0],$remiparr[1],$tsarr[1],$remiparr[2],$tsarr[2]\n";
if(($remiparr[0] eq $tsarr[0]) && ($remiparr[1] eq $tsarr[1]) && ($remiparr[2] eq $tsarr[2])){
unlock($htfile);
close($htfile);
print "returned on match.";
return 0;
}
}
}
unlock($htfile);
close($htfile);
writeipdetail();
return 0;
}
#################################################
##################### Block IP #####################
sub block_ip(){
my(@updated); #the updated file contents
my($done) = 0;
open(my($htfile), "../.htaccess"); #Read
lock($htfile);
while(){
push @updated, $_;
if(/^deny from/i && !$done){
$done++;
push @updated, "deny from $remip\n";
}
}
unlock($htfile);
close($htfile);
open(my($htfile), ">../.htaccess"); # write-over
lock($htfile);
print $htfile @updated;
unlock($htfile);
close($htfile);
}
#################################################
################ List blocked IP addresses ##############
sub get_blocked(){
open(my($htfile), "../.htaccess"); #Read
lock($htfile);
while(){
if(/^deny from/i){
# $done++;
#push @updated, "deny from $remip\n";
#push @updated, $_;
my($ts) = substr($_, 10, length($_));
print "$ts";
}
}
print "and on, and on, and....";
unlock($htfile);
close($htfile);
#print "@updated";
}
#################################################
########### Lock/Unlock Filehandle (htaccess) #############
sub lock {
my($fh) = @_;
flock($fh, LOCK_EX) or die "Cannot lock file - $!\n";
# and, in case someone appended while we were waiting...
seek($fh, 0, SEEK_END) or die "Cannot seek - $!\n";
}
sub unlock {
my ($fh) = @_;
flock($fh, LOCK_UN) or die "Cannot unlock file - $!\n";
}
################################################
The lock and unlock functions above are called to lock and unlock the .htaccess file when writing the blocked IP addresses to the .htaccess file. There are commented print statements throughout the code above that were used for debugging purposes, but left in the code for re-use purposes.
If you are going to use this code in production environments, please make sure that you understand the security and implementation implications of enabling Perl and Perl modules in your environment. This article is not about Perl security, and each environment is different, so know your environment and perform the necessary due diligence before implementing in production environments. As mentioned in the previous article, retrieving and storing information from the .htaccess file is not the most efficient manner of filtering and blocking users. On a larger and busy web server, this IP blocking and filtering should be performed through database look-ups and writing instead, because a database implementation would be more secure and less susceptible to denial of service attack (DoS) than using a file handler on a text file. In the next section we will show how to gather and store data about our users, so a similar database implementation can be used to filter and block IP addresses, or just use the database that you store data into for the filter and blocking operation described above. That is the next likely step in developing this script, but will not cover that in this article.
Gathering
The following Perl script goes through the look-up service data gathering and data storage. Tuple.co implements two different look-up services – MaxMind geolocation service and whoisxmlapi.com. Each data source provides redundant data, but they compliment each other since MaxMind provides accurate geolocation data, and whoisxmlapi.com provides detailed provider information through WHOIS. One of the functions not covered in this article, but implemented on Tuple.co, is a function to automatically send a warning mail to the ISP and other contacts related to hosting the IP where numerous errors are originating. As we will see in the code below, while we do not write redundant IP addresses to .htaccess, we do count the number of times a certain address produces an error code.
############# Write to REMOTEINFO table #########
sub dbwrite(){
######## Geo Stuff Again #####
my $geo = Geo::IPfree->new;
# use memory to speed things up
$geo->Faster;
my($code1, $name1) = $geo->LookUp( $remip );
###################################
(my $sec,my $min,my $hour,my $mday,my $mon,my $year,my $wday,my $yday,my $isdst)=localtime(time);
my $tstamp = sprintf "%4d-%02d-%02d %02d:%02d:%02d\n",$year+1900,$mon+1,$mday,$hour,$min,$sec;
my $dbname = "YOUR DB NAME";
my $dbserver = "localhost";
my $dbusr = "YOUR DB USER";
my $dbpswd = 'YOUR DB USER PW';
my $dbh = DBI->connect("DBI:mysql:$dbname:localhost",$dbusr,$dbpswd) or die "Error: $DBI::errstr\n";
my $sql = "INSERT INTO REMOTEINFO
(IPADDRESS,CNTRY,PORT,REMSYSTEM,REMBROWSER,NOATTEMPTS,FIRSTATTEMPT,
LASTATTEMPT,REFER,REQMETHOD,REQURI) VALUES
('$remip','$name1','$remport','$ua','$ua','1','$tstamp','$tstamp',
'$refer','$reqmethod','$requri') ON DUPLICATE KEY UPDATE LASTATTEMPT=
'$tstamp', NOATTEMPTS=NOATTEMPTS+1";
my $sth = $dbh->prepare($sql) or die "Can't prepare $sql: $dbh->errstr\n";
$sth->execute or die "can't execute the query: $sth->errstr";
$dbh->do($sql);
$dbh->disconnect;
#print "write done."; #For debugging purposes
}
###########################################
The dbwrite() function looks up some simple IP address information using the Geo::IPfree Perl module, which my provider would only provide upon request. We made an additional request for the regular Geo::IP module installation, but denied due to security issues, which was ironic in our situation – trying to secure a site but could not use a tool to do so. This function takes a time stamp then writes the IP address, country, port, remote system type, number of attempts, and other information (abbreviated capitals in the code) and stores them into a database table named REMOTEINFO. We use ‘ON DUPLICATE KEY UPDATE’ in the SQL INSERT in order to prevent redundant writes to the same data table and track the number of errors a particular IP address generates, we . Please note that the dbwrite() function is also called from the writeHTML() function as described above.
The writeipdetail() function below is called from the compareIP() function, since these are paid services and we do not want to lookup details on the same IP address multiple times. MaxMind service offers a web lookup service or a way to specify an updated local database using the Geo::IP module – we implemented the web service as outlined in the code below. This code gathers more detail than the dbwrite() function and writes that data to the IPDETAIL data table. Toward the end of the code we perform a comparison to make sure that the IP address is associated with a domain then call the whoislookup() function.
#################################################
#### Get IP detail from MaxMind and write to IPDETAIL table #####
sub writeipdetail(){
#print "start detail.";
## get detail ip db info here....
# replace this value with license key
my $license_key = "YOUR_LICENSE_KEY";
my $csv = Text::CSV_XS->new( { binary => 1 } );
my $ua = LWP::UserAgent->new( timeout => 5 );
my $h = HTTP::Headers->new;
$h->content_type('application/x-www-form-urlencoded');
my $request = HTTP::Request->new( 'POST', 'http://geoip2.maxmind.com/e',$h,"l=$license_key&i=$remip" );
my $res = $ua->request($request);
my $content = $res->content;
#print "content = $content\n";
$csv->parse($content);
my (
$countrycode, $countryname, $regioncode, $regionname,
$city, $lat, $lon, $metrocode,
$areacode, $timezone, $continent, $postalcode,
$isp, $org, $domain, $asnum,
$netspeed, $usertype, $accuracyradius, $countryconf,
$cityconf, $regionconf, $postalconf, $err
) = $csv->fields;
#print "$countrycode.";
###########################################
## write detail ip db info here....
(my $sec,my $min,my $hour,my $mday,my $mon,my $year,my $wday,my $yday,my $isdst)=localtime(time);
my $tstamp = sprintf "%4d-%02d-%02d %02d:%02d:%02d\n",$year+1900,$mon+1,$mday,$hour,$min,$sec;
my $dbname = "YOUR DB NAME";
my $dbserver = "localhost";
my $dbusr = "YOUR DB USER";
my $dbpswd = 'YOUR DB USER PW';
my $dbh = DBI->connect("DBI:mysql:$dbname:localhost",$dbusr,$dbpswd) or die "Error: $DBI::errstr\n";
my $sql = "INSERT INTO IPDETAIL
(countrycode,countryname,regioncode,regionname,city,lat,lon,
metrocode,areacode,timezone,continent,postalcode,isp,org,domain,
asnum,netspeed,usertype,accuracyradius,countryconf,cityconf,
regionconf,postalconf,err,ipaddress) VALUES $countrycode','$countryname','$regioncode','$regionname','$city','$lat',
'$lon','$metrocode','$areacode','$timezone','$continent','$postalcode',
'$isp','$org','$domain','$asnum','$netspeed','$usertype',
'$accuracyradius','$countryconf','$cityconf','$regionconf',
'$postalconf','$err','$remip') ON DUPLICATE KEY UPDATE
ipaddress='$remip'";
my $sth = $dbh->prepare($sql) or die "Can't prepare $sql: $dbh->errstr\n";
$sth->execute or die "can't execute the query: $sth->errstr";
$dbh->do($sql);
$dbh->disconnect;
#print "second write done."; #For debugging purposes
if ($domain ne "" && lookupdomain($domain) eq "false"){
whoislookup($domain);
}
}
#################################################
############ WHOIS Lookup through whoisxmlapi.com
################################################
# Usage
# whoislookup("tuple.co");
# Requires a domain name string argument!!
################################################
sub whoislookup(){
my $dom = $_[0];
# Create a user agent object
my $ua = LWP::UserAgent->new;
$ua->agent("Tuple.co/0.1 ");
# Create a request
#my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
my $req = HTTP::Request->new(POST => "http://www.whoisxmlapi.com/whoisserver/WhoisService?
domainName=$dom&username=YOURUSERNAME&password=YOURPW");
$req->content_type('application/x-www-form-urlencoded');
$req->content('query=libwww-perl&mode=dist');
# Pass request to the user agent and get a response back
my $res = $ua->request($req);
# Check the outcome of the response
if ($res->is_success) {
#print $res->content;
storelookup($res->content);
}
else {
print "whois lookup failed.";
print $res->status_line, "\n";
}
}
#################################################
####### Store whois lookup info in WHOISINFO table
sub storelookup(){
my $WhoisRecord = XMLin($_[0]);
my $domainname = $WhoisRecord->{domainName};
my $nameserver = $WhoisRecord->{nameServers}->{hostNames}->{Address}->[0];
my $namesrvrip = $WhoisRecord->{nameServers}->{ips}->{Address}->[0];
my $regcountry = $WhoisRecord->{registrant}->{country};
my $regcity = $WhoisRecord->{registrant}->{city};
my $regunparse = $WhoisRecord->{registrant}->{unparsable};
my $regname = $WhoisRecord->{registrant}->{name};
...
}
The whoislookup() function pulls the extra WHOIS information available on the domain associated with the IP address, and also gives us the one piece of information that MaxMind doesn’t provide – the contact information for the domain. The information that we have gathered on the user (or IP address, script, et al) that generated the HTTP error includes basic user agent data that are available as a result of the original connection, the information available on the IP address from MaxMind, and information that is available for the associated domain name on major WHOIS databases. This is a lot of information stored for further analysis later and will provide a wealth of insight for later analysis.
In the next article we will cover the data and analysis of that data for our experiment domain and present the results here. If you have questions, suggestions, or general comments about the code above, please comment in the section below this article. The script covered in this article is available for download to those who subscribe here.
Share on Facebook