Apache ModSecurity Whitelist Generator Script

Powered by Drupal
Submitted by Sam Hobbs on
ModSecurity Logo

This script has been superseded by a commandline utility. Please visit this page for more information ModSecurity is a Web Application Firewall for Apache. It can monitor all of the traffic that is seen by your web server, including request headers and GET and POST data, and block dodgy requests. ModSecurity itself is actually just a rule engine; the clever part is in the rules you pass to it. Many people use the Open Web Appplication Security Project's (OWASP) Core Rule Set (CRS), an open source set of rules that ModSecurity can use to sift the wheat from the chaff, and foil some common types of attack. The CRS was written by studying known vulnerabilities and writing rules that would not only have prevented the attacks, but prevented other similar attacks too. Thus, ModSecurity provides a good all-round protection for your web server. Some types of attack that ModSecurity & the OWASP CRS can help to protect against are:

  • SQL injection
  • Denial of Service
  • Cross-Site Scripting
  • HTTP anomalies (violations of HTTP protocol)
  • Automation detection (stops bots and scanners)
  • Comment spam

And yes, you can run this on your Raspberry Pi, although you might find it slows the server down noticeably. This post assumes you have ModSecurity up and running in DetecitonOnly mode. If you haven't installed ModSecurity and enabled the OWASP CRS yet, I'd highly recommend this guide on LinuxQuestions.org, which will work for Debian-like systems (I've tested it on both Raspbian and Ubuntu).

The Script

Every web application firewall like this will have false positives. To help prevent disruption to your site, ModSecurity comes with a "DetectionOnly" mode: SecRuleEngine DetectionOnly where it will process each request as if it was turned on and write the results to your log files without actually blocking anything. This provides you with useful information about the false positives you need to deal with before turning ModSecurity on for real. Trust me on this one, don't install modsecurity and turn on SecRuleEngine straight away, because you will almost certainly break stuff. False positives are common, and until you write a whitelist to change ModSecurity's behaviour this won't change. Writing a whitelist file manually is torturous drudgery. Don't put yourself through the pain, I wasted a few hours trying to write one for WordPress before I decided it was futile, and looked for a better option. So, here is the result of my frustration: a BASH script that will automatically generate a whitelist file for you. The script should work for any web app (Content management systems like WordPress & Drupal, Webmail apps like Squirrelmail and Roundcube, and other apps like OwnCloud). It works by assuming that any requests that come from a trusted IP address are legitimate: any rules that were triggered were false positives and will go in our whitelist for that location. The script reads through your Apache error log files for statements like this:

[Wed May 07 19:13:54.925435 2014] [:error] [pid 2423] [client 192.168.1.1] ModSecurity: Warning. Match of "eq 1" against "&ARGS:CSRF_TOKEN" required. [file "/etc/modsecurity/modsecurity_crs_43_csrf_protection.conf"] [line "31"] [id "981143"] [msg "CSRF Attack Detected - Missing CSRF Token."] [hostname "www.samhobbs.co.uk"] [uri "/comment/2140/approve"] [unique_id "U2p34n8AAQEAAAl3cyIAAAAM"]

The script then removes the false positive by outputting a whitelist file, which consists of a series of LocationMatch statements. Each LocationMatch statement contains rules that we don't want to be processed at that URL, like so:

<LocationMatch "^/comment/[0-9]+/approve$">
SecRuleRemoveById 981143
</LocationMatch>

This whitelist file will then be included in the relevant VirtualHost file with a statement like this:

# Include personalised whitelist file for ModSecurity
Include /etc/modsecurity/whitelists/samhobbs.co.uk.conf

Neat, right? Here's the script:

#! /bin/bash
#
# This is the ModSecurity CRS Whitelist Generator Script version 16/03/2014
# Not affiliated with the Modsecurity project or the CRS
# Script by Sam Hobbs https://samhobbs.co.uk
# Licence: public domain
# Please let me know how you get on so that I can make improvements: leave a
# comment on my blog or email me at: sam at samhobbs dot co dot uk

# Every installation of Apache with Modsecurity will probably have false positives.
# This script aims to make writing a whitelist file quicker than doing it manually.
# Required input is as many error logs as you have available from the relevant virutalhost,
# with modsecurity set to detect only and all of the CRS enabled 

#====================================== CHANGELOG =========================================#

# Script version 
VERSION="27/05/2014"

# 16/03/2014 changed default <LocationMatch> regex in whitelist file to .* from *
# 12/05/2014 	changed defaults for drupal
#		egrep used instead of grep to provide better regex
#		now checks to see if locationmatch would be empty before writing it

#===================================== USER INPUT =========================================#

# This script works by assuming that all traffic from a friendly IP address is legitimate
# and that the resulting errors are false positives.

# Define one or more friendly IP address, separating each IP with a space. If you are
# hosting at home, a good choice is your router's LAN IP address, since your server sees
# all traffic from your LAN as originating here.
# You might also like to add IP addresses of users you know weren't abusing the site, for
# example people who left legitimate comments. Wordpress tells you the IP address used to
# post each comment on the comment moderation GUI.
FRIENDLY_IP="192.168.1.1"

# Define a list of special locations. The matching process uses regex. A LocationMatch
# statement will be created for each one of these locations, which will be populated with
# rule IDs for all the locations that match that regex. Leave a space between each location
# that you enter.
# There is no need to start each location with "^"; the script adds the character for you.
# If you don't end the location with a "$" then the script will automatically add an
# asterisk to the end of the location in the LocationMatch statement so that it will match
# all files beginning with that path, i.e.
# <LocationMatch "^/wordpress/wp-admin/*"> matches /wordpress/wp-admin/wp-login.php, but
# <LocationMatch "^/wordpress/wp-admin/$"> does not.

SPECIAL_LOCATIONS="\
/authorize.php \
/admin/config$ \
/admin/config/content/mollom \
/admin/config/content/syntaxhighlighter \
/admin/config/people \
/admin/config/search \
/admin/config/system/actions \
/admin/content \
/admin/reports \
/admin/structure/menu \
/admin/structure/types \
/admin/modules \
/admin/appearance/settings \
/admin/people/permissions \
/comment/reply \
/comment/[0-9]+$ \
/comment/[0-9]+/edit \
/comment/[0-9]+/approve \
/file/ajax/field_image/und/0/ \
/index.php$ \
/node/[0-9]+/delete$ \
/node/[0-9]+/edit$ \
/node/add/article \
/sites/default/files/css/ \
/sites/default/files/js \
/token/tree \
/user/[0-9]+/edit$ \
.*.(png|jpg|JPG|gif|ico)$"

# Define directory holding the Apache error log files to be processed
LOG_DIR=~/errors-new

# Define directory for output files:
OUTPUT_DIR=~/modsec-whitelist-samhobbs-new-1 

#===================================== KNOWN BUGS =========================================#

# SecRuleRemoveById 891143 doesn't work because the rule has its own whitelist built in
# see:   /usr/share/modsecurity-crs/optional_rules/modsecurity_crs_43_csrf_protection.conf

#==================================== HOUSEKEEPING ========================================#

# Create the output directory if it doesn't exist:
if [ ! -d $OUTPUT_DIR ]
then
  mkdir $OUTPUT_DIR
  echo "Output directory has been created at $OUTPUT_DIR"
  echo ""
fi

# Delete the previous output files if they exist
if [ -f $OUTPUT_DIR/whitelist ]
then
  echo -n "Old output files detected, deleting..."
  rm $OUTPUT_DIR/*
  echo "...done"
  echo ""
fi

# Some files:
COMBINED_LOG=$OUTPUT_DIR/combined_log
PROBLEM_LOCATIONS=$OUTPUT_DIR/problem_locations
ROOT_LOG=$OUTPUT_DIR/root_log
PROBLEM_LOCATIONS_REMAINING=$OUTPUT_DIR/problem_locations_remaining
ROOT_IDS=$OUTPUT_DIR/root_ids
GROUPED_LOCATIONS=$OUTPUT_DIR/grouped_locations
WHITELIST_FILE=$OUTPUT_DIR/whitelist

# Rules tripped on the root domain will be added as generic exceptions for the whole virtualhost:
echo "Rules tripped on the root domain will be added as generic exceptions for the whole virtualhost"
echo ""
echo "In addition to this, you have selected the following special locations to be grouped into whitelist statements"
for variable in $SPECIAL_LOCATIONS; do
echo "> $variable";
done
echo ""

# Perform work in temorary files
TEMPFILE1=$(mktemp)
TEMPFILE2=$(mktemp)
TEMPFILE3=$(mktemp)
TEMPFILE4=$(mktemp)

#==================================== LOG PROCESSING ======================================#

# Read all files in the log file directory and combine them into one long file:
echo "Processing error logs..."
for f in $LOG_DIR/*; do
  if [[ "$f" =~ \.gz$ ]]; then
    echo "> reading $f"
    zcat $f >> $TEMPFILE1
  elif [[ "$f" =~ \.log ]]; then
    echo "> reading $f"
    cat $f >> $TEMPFILE1
  else
    echo "File $f not recognised as a log file, skipping"
  fi
done
echo ""

echo "Converting logs into a useful format:"
# Remove any log entries from the file that are not generated by ModSecurity
echo -n "> Removing entries that were not generated by ModSecurity..."
grep ModSecurity $TEMPFILE1 > $TEMPFILE2
echo "...done."
#echo ""

# Remove log entries from traffic that is not from friendly IP addresses
echo -n "> Removing errors that were not from friendly IP addresses..."
for ip in $FRIENDLY_IP; do
grep "client $ip" $TEMPFILE2 >> $TEMPFILE4
done
echo "...done."
#echo ""

# Write combined log file:
echo -n "> Generating combined log file..."
cp $TEMPFILE4 $COMBINED_LOG
echo "...done ($COMBINED_LOG)."
#echo ""

# Filter out rules that match the root regex:
echo -n "> Separating entries that match the root location"
cat $COMBINED_LOG | grep uri.\"/\" > $ROOT_LOG
echo "...done ($ROOT_LOG)."
echo ""


# Now generate a list of locations with problems:
echo -n "Generating a list of all problem locations"
awk ' { print $(NF-2) }' $COMBINED_LOG | cut -d '"' -f 2 | sort | uniq > $PROBLEM_LOCATIONS
echo "...done ($PROBLEM_LOCATIONS)."
echo ""

#==================================== WHITELIST HEADER ====================================#

echo "# This file was created using the ModSecurity CRS Whitelist Generator script, version $VERSION" >> $WHITELIST_FILE
echo "# Save this file to /etc/modsecurity/whitelists/domainname.conf and include it in the relevant" >> $WHITELIST_FILE
echo "# VirtualHost configuration with \"Include /etc/modsecurity/whitelists/domain.conf\"" >> $WHITELIST_FILE
echo "# See https://samhobbs.co.uk for more information" >> $WHITELIST_FILE
echo "" >> $WHITELIST_FILE
echo "" >> $WHITELIST_FILE

#==================================== MATCH EVERYWHERE ====================================#
# Rule matches for the site's root will be whitelisted for the whole site:

echo "Now working on rules to whitelist everywhere"
# Generate a list of IDs that match the root regex
echo -n "> Generating a list of rule IDs for the root location"
echo "##" > $ROOT_IDS
cat $ROOT_LOG | grep -o 'id \"......\"' | cut -d '"' -f 2 | sort -u >> $ROOT_IDS
echo "...done."
echo -n "> Writing LocationMatch statement to whitelist file..."
cat $ROOT_IDS | while read line; do sed 's/^/SecRuleRemoveById /'>> $WHITELIST_FILE; done
echo "" >> $WHITELIST_FILE
echo "...done."
echo ""

#=================================== SPECIAL LOCATIONS ====================================#

COUNT=1
for LOCATION in $SPECIAL_LOCATIONS; do
  echo "Now working on the following location: $LOCATION"
  # Generate a list of problem locations for $LOCATION
  echo -n "> Generating a list of matching locations"
  eval "echo "$OUTPUT_DIR/location_${COUNT}" >/dev/null"
  TEMPLOCATION="$OUTPUT_DIR/location_${COUNT}"
  cat $PROBLEM_LOCATIONS | egrep "^$LOCATION" >> $TEMPLOCATION
  echo "...done."

  # Generate list of rule IDs for $LOCATION
  echo -n "> Generating a list of rule IDs for this location"
  echo "#" > $TEMPFILE3 # just to clear tempfile3
  cat $TEMPLOCATION | while read LINE; do
  grep $LINE $COMBINED_LOG | grep -o 'id \"......\"' | cut -d '"' -f 2 >> $TEMPFILE3
  done
  eval "echo "$OUTPUT_DIR/id_${COUNT}" >/dev/null"
  TEMPIDFILE="$OUTPUT_DIR/id_${COUNT}"
  cat $TEMPFILE3 | sort -u | grep -vxF -f $ROOT_IDS > $TEMPIDFILE
  echo "...done."
  let COUNT=COUNT+1
  
  # Add $LOCATION to whitelist file
  echo -n "> Writing LocationMatch statement to whitelist file"
  # If defined location ends in $ don't add *, if not then do add a *
  if [[ "$LOCATION" =~ \$$ ]]; then
    echo "<LocationMatch \"^$LOCATION\">" >> $WHITELIST_FILE
  else
    echo "<LocationMatch \"^$LOCATION.*\">" >> $WHITELIST_FILE
  fi
  cat $TEMPIDFILE | while read line; do sed 's/^/SecRuleRemoveById /'>> $WHITELIST_FILE; done
  echo "</LocationMatch>" >> $WHITELIST_FILE
  echo "" >> $WHITELIST_FILE
  echo "...done."
  echo ""
done

#================================ REMAINING LOCATIONS LIST ================================#

# Generate a list of all locations that are covered by the group statements
echo -n "Find locations already covered by the group statements"
for file in $OUTPUT_DIR/*; do
  if [[ "$file" =~ location_[0-9] ]]; then
  cat $file >> $GROUPED_LOCATIONS
  fi
done
echo "...done ($GROUPED_LOCATIONS)."

# Now remove those locations from the master list of problem locations to leave a list of remaining problem locations. Also remove root, since this is dealt with separately.
echo -n "Remove these locations from the master list"
grep -vxF -f $GROUPED_LOCATIONS $PROBLEM_LOCATIONS | grep -v ^/$ > $PROBLEM_LOCATIONS_REMAINING
echo "...done ($PROBLEM_LOCATIONS_REMAINING)."

#================================== REMAINING LOCATIONS ===================================#

# Now write the remaining locations to the whitelist file
echo -n "Now writing the remaining problem locations to the whitelist file"
cat $PROBLEM_LOCATIONS_REMAINING | while read line; do
  # list all the rule IDs that match this location and don't match the rule IDs we have removed globally
  grep $line $COMBINED_LOG | grep -o 'id \"......\"' | cut -d '"' -f 2 | sort -u | grep -vxF -f $ROOT_IDS | sed 's/^/SecRuleRemoveById /'> $TEMPFILE4
  # count the number of rule IDs - if there are none then we don't want to write an empty locationmatch statement
  LINES=$(wc -l $TEMPFILE4 | cut -f1 -d ' ')
  if [[ $LINES != 0 ]] ; then
    # since some of these are .php scripts, .php?foo=bar needs to match, so don't add $
    echo "<LocationMatch \"^$line.*\">" >> $WHITELIST_FILE
    cat $TEMPFILE4 >> $WHITELIST_FILE
    echo "</LocationMatch>" >> $WHITELIST_FILE
    echo "" >> $WHITELIST_FILE
  fi
done
echo "...done."
echo ""
echo "Your whitelist file has been created at $WHITELIST_FILE"
echo ""

#====================================== CLEAN UP ==========================================#

# Clean up temporary files
echo -n "Cleaning up..."
rm -f $TEMPFILE1 $TEMPFILE2 $TEMPFILE3 $TEMPFILE4
echo "...done"

Instructions

  1. Copy the script and save it to ~/bin/generate-modsec-whitelist.sh
  2. Make a copy of your Apache error log files, which are found on Debian systems at /var/log/apache2/error.log and move them to a folder in your home directory, for example ~/error-logs/2014-05-30
  3. Make the log files owned by your user, i.e. chown -R user:user ~/error-logs/2014-05-30
  4. Open the script in a text editor and go to the "USER INPUT" section. Fill in the list of friendly IP addresses (start with just your router's IP address to begin with if you are hosting at home).
  5. Fill in some SPECIAL_LOCATIONS where you would like the script to group the rules, i.e. if you have URLs like yourdomain.com/foo/bar/a and yourdomain.com/foo/bar/b that you think are similar (rules tripped on one will probably be tripped on another), add /foo/bar/ and all URLs at yourdomain.com/foo/bar/.* will be grouped together
  6. Fill in LOG_DIR with the path to the error logs you copied over earlier
  7. Fill in the OUTPUT_DIR, which is the directory the script will write its output files to. As well as the whitelist file, it writes a few other intermediates to help you understand how it has built the whitelist.
  8. Save your changes, and then make the script executable: chmod +x ~/bin/generate-modsec-whitelist.sh
  9. Run the script generate-modsec-whitelist.sh (or use the full path if you haven't added ~/bin to your PATH)
  10. Review the whitelist file that was generated in the output folder, look for sensible ways to group locations, add or modify the special locations as necessary and run the script again. Repeat until you are happy.
  11. Copy the whitelist file to /etc/modsecurity/whitelists/yourdomain.com.conf
  12. Use the Include directive to make Apache process the whitelist file as part of the relevant virtualhost's configuration, i.e.
        <IfModule mod_security2.c>
        # SecRuleEngine On, Off or DetectionOnly
        SecRuleEngine On           
          
        # Include personalised whitelist file for ModSecurity
        Include /etc/modsecurity/whitelists/samhobbs.co.uk.conf
        </IfModule>

Set SecRuleEngine On only if you really want to turn ModSecurity on, you can still test your rules in DetectionOnly mode by running this command to watch your log files as you browse around the site:

tail -f /var/log/apache2/error.log

or, for more information:

tail -f /var/log/apache2/modsec_audit.log

Finally, reload Apache to make your changes take effect:

sudo service apache2 reload

Final points

This script is not perfect, but it will help you to get started. There are cleverer ways to build a whitelist file, but they require more knowledge of ModSecurity and more time - this is a quick and dirty way to get ModSecurity up and running without breaking everything. It's important to recognise that this is a bit of a sledgehammer approach to the problem. The crudest way of dealing with false positives is to remove the rule files entirely (the ones in /etc/modsecurity/). Slightly better than that is removing rules by their IDs, and slightly better than that is removing them by ID for specific locations, which is what the script does. There are better ways of doing things though: most of the rules do some kind of pattern matching with regular expressions (regex), looking for naughty terms or patterns. If you're clever, you can update the rule to whitelist problem words that are causing the false positive:

SecRuleUpdateTargetById 958895 !ARGS:email

The above would remove the argument "email" from the rule 958895 without requiring direct modification to the CRS, which is ideal - you don't want to have to manually hack the CRS each time a new set comes out. If you place parameters like this inside a custom rules file, e.g. /etc/modsecurity/modsecurity_crs_60_customrules.conf then they will be unaffected by upgrades. Better yet, you can do the above but only for specific locations:

SecRule REQUEST_FILENAME "@streq /path/to/file.php" \
"phase:1,t:none,nolog,pass,ctl:ruleUpdateTargetById=958895;!ARGS:email"

This would remove the argument "email" from the rule, but only at /path/to/file.php. Cool eh? The more time you have, the better you can make your whitelist. Comparing the whitelist that my script generates to the two methods above, you can see how it's quite "loose" - you'll remove the false positive at that location, but you also stop the rule from catching any other requests that would have tripped the rule at that location. This post gives a very good description of the different whitelisting methods available in ModSecurity. It is written by SpiderLabs, the sponsor of ModSecurity, and I found it very useful (it's also where I lifted those two examples of better whitelisting methods from). If you're after more information on ModSecurity, a great place to start is the ModSecurity Reference Manual.

Add new comment

The content of this field is kept private and will not be shown publicly.

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.