Three sysadmin mistakes start-ups make

We often get the opportunity to help out our fellow starts-ups with their servers. While being a start-up with scaling issues is a sign that things are going well, sometimes such a small team does not have the expertise to make sure all their servers are in order. We wanted to share a couple pitfalls that we have helped diagnose in hopes to prevent other start-ups from doing the same thing.

1. Switching web servers when in trouble

All the sudden your web server has a 10.0 load average and the Apache process is hogging memory and pegging the CPU. What do you do? Switch out Apache for nginx, of course. Just kidding! Although, you would be surprised how often the web server gets swapped out during a time of peril. Under extreme loads (such as benchmarks) some web servers out perform others in subtle ways. However, under normal "growing start-up" load, slow downs are more likely due to misconfiguration or poorly written application code. If you switch to another web server, you might luck out and get a better default configuration, but that is about it.

A few things to keep in mind while administering a web server:

    * Swap is a killer. If you hit swap, your entire machine (including your webserver) will slow down dramatically. To diagnose what is going on: run top, press capital "M" (on current Ubuntu/Debian), and you will get a list of the top memory users. If your webserver is using more than 20-40MB RSS per process, you probably have something configured incorrectly (too many modules loaded, wacky app code, etc).
    * Memory usage of an web server is just a math equation. If you are using Apache with the prefork MPM, each connection requires a new Apache process. If you have something like PHP loaded up in each of those processes, each one will come in around 10-20MB. This means you could only sustain around 100 concurrent connections before running out of memory on a 1GB machine. To get around this particular example, use the worker MPM with Apache, which adds about 1MB overhead per connection via threading.
    * Serving static content is the easiest possible task for any web server. Doing a reverse proxy from Apache to lighttpd (for example) for just static content will get you nothing but complexity -- that is unless you have Apache misconfiguration to serve static content through your framework. Do not offload your static content unless it is to get load off your application server -- such as by using a reverse proxy cache.
    * Check your error log! Error logs are a great way to see what is going wrong, yet often go overlooked.

2. Using sqlite in production

Many frameworks, including Django and Rails, make it easy to use sqlite as a test database. However, sqlite should never be used in production. It is important to remember that sqlite is single flat file, which means any operation requires a global lock. Locks will inevitably cause points of contention if the database gets even remotely busy. On top of that, your web server will appear to peg the CPU when under load. This is because of all that contention around the lock.

The moral of the story is to use mysql or postgres in production. These relational databases are heavily optimized for production environments, and will make your app much more responsive.

3. Forking in app code

Sometimes the first pass at a webapp does some crazy stuff. This crazy stuff might involve system calls, such as renaming a file, or checking the last modified date of an image. While issuing system calls per web request might not be the best idea in the first place, under no circumstances should you fork. If you fork inside an app server, such as mod_python, you will fork the entire parent process (apache!). This could happen by calling something like os.system("mv foo bar") from a python application. It is important to remember that os.system uses the "system" libc function, meaning that it forks and passes the args to your default shell. Overlooking the security implications of this (of which there are many), this causes big performance problems. Fork is one of the most expensive system calls, and should never be used on a per request basis.

Moral here: If you have to use system calls (which you should try not to anyway), never use use fork. Use the native stuff, like (in python) os.rename, os.stat, etc.