Apache
Article by
on August 30, 2013, last modified on September 19, 2013The Apache web server is a proven cross-platform application. It is arguably bloated and difficult to work with, but it works and it has a large community of support.
Installation
Apache can be installed on most any platform. On Debian, run:
$ sudo apt-get install apache2
On, Mac, you have a number of options. You can use the built-in apache, MAMP, or the MacPorts apache (among many other options, like brew). Here
Caching
There are three types of caching in apache:
For now, I will only cover disk cache and I will list alternatives.
1. Disk Cache
Disk cache is just as it sounds, it stores the HTTP responses to the disk. This can significantly reduce load on servers where there is a lot of computation per request, I/O per request, or simply lots of requests. The disk cache isn't just a switch you flip on, though. It obeys the HTTP protocol's cache headers, so you have to make sure you either set CacheIgnoreNoLastMod to On or add "Etag", "Last-Modified" or the "Expires" (See "What Can be Cached?" in the "Caching Overview" and read this StackOverflow thread).
To setup disk cache you have to:
- Enable the cache and disk_cache modules:
$ sudo a2enmod cache $ sudo a2enmod disk_cache
- Create your cache directory, wherever you want it for the site:
$ sudo mkdir -p /var/cache/apache2/mod_disk_cache
- Configure your site's apache config. Here is an example:
<IfModule mod_cache.c> <IfModule mod_disk_cache.c> CacheRoot "/var/cache/apache2/mod_disk_cache" CacheEnable disk / CacheIgnoreNoLastMod On CacheDefaultExpire 1800 CacheIgnoreHeaders Set-Cookie CacheDisable /login CacheDisable /admin </IfModule> </IfModule>
A few pointers: CacheIgnoreNoLastMod as mentioned needs to be on unless you manage cache control yourself, CacheDefaultExpire is the default expiry (here it is 30 min), CacheIgnoreHeaders needs to include Set-Cookie to prevent sticky cookies (yuk!), and CacheDisable will allow you to remove portions of your site from caching.
For some debugging help, see the "Debugging Disk Cache" section.
Further Reading
- http://www.softslate.com/blog/2011/07/apache-modcache-in-real-world.html (setup caching)
- http://www.philchen.com/2009/02/09/some-tuning-tips-for-apache-mod_cache-mod_disk_cache (setup caching)
- http://www.mabishu.com/blog/2009/12/08/using-memcache-server-as-apache-content-cach/ (setup caching)
- http://www.askapache.com/hacking/speed-site-caching-cache-control.html (configuring cache headers)
- http://www.vitki.net/story/logging-apaches-cache-modules-efficiency (logging cache info)
- http://docs.oracle.com/cd/A97329_03/bi.902/a90500/admin-05.htm (logging cache info)
2. Alternatives
- https://developers.google.com/speed/pagespeed/module
- put a caching application in front of apache: varnish, nginx, etc
Redirects and Rewrites
Very often you will want to modify inbound requests by either actually rewriting the request or redirecting to another URL or both. The two main modules are mod_alias and mod_rewrite. mod_alias is preferred for performance, though in most options you don't have a choice and must use mod_rewrite.
mod_alias
mod_alias has two main directives for redirecting: Redirect and RedirectMatch.
Redirect
Redirect is the simplest:
Redirect /incoming-path/file.html http://outgoing.com/file.html
Note: Redirect statements ignore the query string.
RedirectMatch
RedirectMatch is the same as Redirect only it gives you the ability to match with a regular expression:
RedirectMatch /incoming-path/*.html$ http://outgoing.com/file.html
Note: RedirectMatch statements ignore the query string. To strip off the query string add a '?' to the destination URL.
mod_rewrite
mod_rewrite is incredibly powerful.
SSL
SSL with Apache deserves its own section at the very least. There is a lot to talk about here. For now, I will link to two articles:
- http://blog.andyhunt.info/2011/11/26/apache-ssl-on-max-osx-lion-10-7/
- http://thesimplesynthesis.com/article/ssl
Debugging
Apache does have docs on debugging, but I will highlight a few things from experience.
1. Testing for Syntax Errors
$ apachectl configtest Syntax OK
2. Reading Log Files
a. Error Logs
Perhaps the easiest way to read a log file is just to tail it:
$ tail -f /var/log/apache/error_log
Open up a terminal and run the above command and watch as new requests come in. Or, you can run:
$ less /var/log/apache/error_log
which will allow you to see the whole file, starting at the bottom ("more" will start you at the top). To find out where apache is logging errors, you will need to search through the configuration for a ErrorLog directive that looks like:
ErrorLog /var/log/apache/error_log
It could be that each site logs to a different directory, so heads up.
b. Access Logs
The access logs can show you information about requests, such as the HTTP response code or the user agent that requested the page. To find out where apache is logging requests, you will need to search through the configuration for a CustomLog directive that looks like:
CustomLog /var/log/apache/access_log vhost_combined
Similarly, run tail -f, less, or more. But, perhaps those most value from access logs comes from "mining" them for IP's to see if you are getting a DoS attack, traffic patterns (requests per second), and the like.
Here is a list of log parsers, most of which I haven't used:
- https://github.com/deviantintegral/apache_rps (Bash)
- http://code.google.com/p/apachelog/ (Python)
- https://github.com/rytis/Apache-access-log-parser (Python)
- https://github.com/lethain/apache-log-parser (Python)
- https://github.com/basuke/Apache-Access-Log-Parse-Library-for-Python (Python)
- http://code.google.com/p/apache-scalp/ (C)
- https://github.com/wvanbergen/request-log-analyzer (Ruby)
- https://gist.github.com/saas786/2516978 (Ruby)
- https://github.com/weppos/apachelog2feed (PHP)
- https://gist.github.com/seratch/1297691 (Scala)
- http://www.apacheviewer.com/ (Windows EXE)
- http://sourceforge.net/projects/mindtreeinsight/ (Windows EXE)
3. Debugging Disk Cache
- Ensure that apache has write access to /var/cache/apache2/mod_disk_cache (or wherever your cache root is).
- Confirm it can even cache something by forcing it to cache everything with the CacheIgnoreNoLastMod On. If that doesn't work, I suspect the issue is not with mod_cache/mod_disk_cache.
- Ensure that the requests are idempotent requests: GET, HEAD, PUT, DELETE.
- Ensure there is no "Authorization" header.
- Ensure there is not already a Cache-Control private or no-store header. If there is, then you can remove that header using:
Header unset Cache-Control - Ensure there is a "Etag", "Last-Modified" or "Expires" header.
- Finally, read through the "What Can Be Cached?" section of the apache caching docs overview.
You can also add a %{cache-status flag} to the logs, but I haven't been able to get that to work.
4. Process Trace
Sometimes things get desperate. Perhaps apache is giving you a segfault error and you have no idea why apache is silently dying. A process trace will hopefully give some clarity in these moments.
a. Using strace (or dtruss on Mac)
- Put apache in single process mode so that we have only one process to trace:
$ sudo /usr/sbin/httpd -k start -X -f /etc/apache2/httpd.conf
Or, for MAMP:
$ sudo /Applications/MAMP/Library/bin/httpd -k start -X -f /Applications/MAMP/conf/apache/httpd.conf
- Determine what the apache process id is:
$ ps aux | grep http jpurcell 11242 0.0 0.0 2432768 620 s002 R+ 3:13PM 0:00.00 grep http _www 10255 0.0 0.1 2455572 12060 s001 S+ 3:12PM 0:00.11 /usr/sbin/httpd -k start -X -f /etc/apache2/httpd.conf root 10254 0.0 0.0 2432908 828 s001 S+ 3:12PM 0:00.01 sudo /usr/sbin/httpd -k start -X -f /etc/apache2/httpd.conf
- Attach strace or dtruss to the process. In this case I will show dtruss:
$ sudo dtruss -p 10255 SYSCALL(args) = return read(0x11, "\2742Wk\0", 0x541) = 1345 0 open_nocancel(".\0", 0x0, 0x1) = 18 0 fstat64(0x12, 0x7FFF54AFF0E0, 0x0) = 0 0 fcntl_nocancel(0x12, 0x32, 0x7FFF54AFF2E0) = 0 0 close_nocancel(0x12) = 0 0 stat64("/var/www/vhosts/mysite.com/htdocs\0", 0x7FFF54AFF050, 0x0) = 0 0 ... (and on and on and on the trace goes. Where it stops, that's where you can guess you have a problem.)
b. Using gdb
For gdb, I will simply add this link for now: http://stackoverflow.com/a/7752606. I found gdb to be significantly better in terms of being able to understand the output, and it gives the option to step through each step of the process.