Apache

Article by on August 30, 2013, last modified on September 19, 2013

The Apache web server is a proven cross-platform application. It is arguably bloated and difficult to work with, but it works and it has a large community of support.

Installation

Apache can be installed on most any platform. On Debian, run:

$ sudo apt-get install apache2

On, Mac, you have a number of options. You can use the built-in apache, MAMP, or the MacPorts apache (among many other options, like brew). Here

Caching

There are three types of caching in apache:

For now, I will only cover disk cache and I will list alternatives.

1. Disk Cache

Disk cache is just as it sounds, it stores the HTTP responses to the disk. This can significantly reduce load on servers where there is a lot of computation per request, I/O per request, or simply lots of requests. The disk cache isn't just a switch you flip on, though. It obeys the HTTP protocol's cache headers, so you have to make sure you either set CacheIgnoreNoLastMod to On or add "Etag", "Last-Modified" or the "Expires" (See "What Can be Cached?" in the "Caching Overview" and read this StackOverflow thread).

To setup disk cache you have to:

  1. Enable the cache and disk_cache modules:
    $ sudo a2enmod cache
    $ sudo a2enmod disk_cache
  2. Create your cache directory, wherever you want it for the site:
    $ sudo mkdir -p /var/cache/apache2/mod_disk_cache
  3. Configure your site's apache config. Here is an example:
    <IfModule mod_cache.c>
        <IfModule mod_disk_cache.c>
            CacheRoot "/var/cache/apache2/mod_disk_cache"
            CacheEnable disk /
            CacheIgnoreNoLastMod On
            CacheDefaultExpire 1800
            CacheIgnoreHeaders Set-Cookie
            CacheDisable /login
            CacheDisable /admin
        </IfModule>
    </IfModule>

    A few pointers: CacheIgnoreNoLastMod as mentioned needs to be on unless you manage cache control yourself, CacheDefaultExpire is the default expiry (here it is 30 min), CacheIgnoreHeaders needs to include Set-Cookie to prevent sticky cookies (yuk!), and CacheDisable will allow you to remove portions of your site from caching.

For some debugging help, see the "Debugging Disk Cache" section.

Further Reading

2. Alternatives

Redirects and Rewrites

Very often you will want to modify inbound requests by either actually rewriting the request or redirecting to another URL or both. The two main modules are mod_alias and mod_rewrite. mod_alias is preferred for performance, though in most options you don't have a choice and must use mod_rewrite.

mod_alias

mod_alias has two main directives for redirecting: Redirect and RedirectMatch.

Redirect

Redirect is the simplest:

Redirect /incoming-path/file.html http://outgoing.com/file.html

Note: Redirect statements ignore the query string.

RedirectMatch

RedirectMatch is the same as Redirect only it gives you the ability to match with a regular expression:

RedirectMatch /incoming-path/*.html$ http://outgoing.com/file.html

Note: RedirectMatch statements ignore the query string. To strip off the query string add a '?' to the destination URL.

mod_rewrite

mod_rewrite is incredibly powerful.

SSL

SSL with Apache deserves its own section at the very least. There is a lot to talk about here. For now, I will link to two articles:

Debugging

Apache does have docs on debugging, but I will highlight a few things from experience.

1. Testing for Syntax Errors

$ apachectl configtest
Syntax OK

2. Reading Log Files

a. Error Logs

Perhaps the easiest way to read a log file is just to tail it:

$ tail -f /var/log/apache/error_log

Open up a terminal and run the above command and watch as new requests come in. Or, you can run:

$ less /var/log/apache/error_log

which will allow you to see the whole file, starting at the bottom ("more" will start you at the top). To find out where apache is logging errors, you will need to search through the configuration for a ErrorLog directive that looks like:

ErrorLog /var/log/apache/error_log

It could be that each site logs to a different directory, so heads up.

b. Access Logs

The access logs can show you information about requests, such as the HTTP response code or the user agent that requested the page. To find out where apache is logging requests, you will need to search through the configuration for a CustomLog directive that looks like:

CustomLog /var/log/apache/access_log vhost_combined

Similarly, run tail -f, less, or more. But, perhaps those most value from access logs comes from "mining" them for IP's to see if you are getting a DoS attack, traffic patterns (requests per second), and the like.

Here is a list of log parsers, most of which I haven't used:

3. Debugging Disk Cache

  1. Ensure that apache has write access to /var/cache/apache2/mod_disk_cache (or wherever your cache root is).
  2. Confirm it can even cache something by forcing it to cache everything with the CacheIgnoreNoLastMod On. If that doesn't work, I suspect the issue is not with mod_cache/mod_disk_cache.
  3. Ensure that the requests are idempotent requests: GET, HEAD, PUT, DELETE.
  4. Ensure there is no "Authorization" header.
  5. Ensure there is not already a Cache-Control private or no-store header. If there is, then you can remove that header using:
    Header unset Cache-Control
  6. Ensure there is a "Etag", "Last-Modified" or "Expires" header.
  7. Finally, read through the "What Can Be Cached?" section of the apache caching docs overview.

You can also add a %{cache-status flag} to the logs, but I haven't been able to get that to work.

4. Process Trace

Sometimes things get desperate. Perhaps apache is giving you a segfault error and you have no idea why apache is silently dying. A process trace will hopefully give some clarity in these moments.

a. Using strace (or dtruss on Mac)

  1. Put apache in single process mode so that we have only one process to trace:
    $ sudo /usr/sbin/httpd -k start -X -f /etc/apache2/httpd.conf

    Or, for MAMP:

    $ sudo /Applications/MAMP/Library/bin/httpd -k start -X -f /Applications/MAMP/conf/apache/httpd.conf
  2. Determine what the apache process id is:
    $ ps aux | grep http
    jpurcell 11242 0.0 0.0 2432768 620 s002 R+ 3:13PM 0:00.00 grep http
    _www 10255 0.0 0.1 2455572 12060 s001 S+ 3:12PM 0:00.11 /usr/sbin/httpd -k start -X -f /etc/apache2/httpd.conf
    root 10254 0.0 0.0 2432908 828 s001 S+ 3:12PM 0:00.01 sudo /usr/sbin/httpd -k start -X -f /etc/apache2/httpd.conf
  3. Attach strace or dtruss to the process. In this case I will show dtruss:
    $ sudo dtruss -p 10255
    SYSCALL(args) = return
    read(0x11, "\2742Wk\0", 0x541) = 1345 0
    open_nocancel(".\0", 0x0, 0x1) = 18 0
    fstat64(0x12, 0x7FFF54AFF0E0, 0x0) = 0 0
    fcntl_nocancel(0x12, 0x32, 0x7FFF54AFF2E0) = 0 0
    close_nocancel(0x12) = 0 0
    stat64("/var/www/vhosts/mysite.com/htdocs\0", 0x7FFF54AFF050, 0x0) = 0 0
    ...
    (and on and on and on the trace goes. Where it stops, that's where you can guess you have a problem.)

b. Using gdb

For gdb, I will simply add this link for now: http://stackoverflow.com/a/7752606. I found gdb to be significantly better in terms of being able to understand the output, and it gives the option to step through each step of the process.

Older Articles »