So Many Choices:
Web App Deployment
with Perl, Python, and Ruby
End Point Corporation
Utah Open Source Conference 2010
Who am I?
CTO at End Point, a consultancy focused on open source in the ecommerce and database arenas.
We deploy and support web applications for our clients, so new happenings in that field are always of interest to us.
In recent years the web app deployment options have gotten much more diverse for Perl, Python, and Ruby.
There are several excellent open source options for scaling up websites.
In general, website performance and scalability have become hot topics.
Why should you care?
If you already have a working website that is reliable, handles its peak traffic times with ease, and you're on the current versions of all your software, you probably don't need to change anything.
Google said a year or two ago that they would start considering site performance in their search rankings.
Some research has shown that even tenths of a second speedups increase ecommerce sales.
So it's good to squeeze out more performance, or maybe use less hardware, and deliver a better experience to users.
The original web app interface: CGI
CGI (Common Gateway Interface) is the oldest dynamic server-side web technology. It's a simple shim between a web server and a program.
- Simple - request information passed in via environment variables and stdin, response returned on stdout
- Stateless - program is invoked anew for each request, so programs start with a clean slate each time, and memory leaks get cleaned up as each run exits
- Slow - requires a fork & exec for each request, plus script compilation time for anything not precompiled
- Local - just put the program somewhere on the same server and tell the web server
- Security - can be good when used with set-UID scripts, or suEXEC to run as a separate user, or SELinux to compartmentalize
- Quick & dirty - useful for porting ad-hoc reporting utilities to the web
- Slow - requires a fork & exec for each request, plus script compilation and startup time for anything not precompiled
- Local - can only run programs on the same local machine as the web server
- Security - can be bad in poorly written programs that are allowed to access way too much on the machine either by running as web server user, or with lots of world-readable files
- Memory - can be hidden source of cache thrashing as the CGI program can use a lot of memory, then die, leaving no evidence that it did so
FastCGI is a way to start up persistent CGI-like processes to mitigate many of the downsides of CGI.
- It works more or less as advertised
- Adapters available for most web servers
- Requires a custom webserver module
- Ancient, mostly unmaintained code
- Anecdotally not very stable especially under heavy load
- Feels like an awkward glue piece
Application inside web server
Apache mod_perl, mod_python, mod_ruby
In this model the entire application lives directly inside the web server.
For a long time, mod_perl was the standard way to do this, with mod_python having some popularity and mod_ruby never really catching on.
- Can access all parts of the Apache request/response lifecycle
- Code is loaded and compiled at Apache startup time before forking, so much of child process memory starts out shared
- Requests are handled direcly by the web server, for high efficiency
- Local - still easily put the program somewhere on the same server and tell the web server
- CGI compatibility module makes adapting CGIs fairly easy
- Proven in production at many sites over many years
- Security - everything runs in the webserver itself, as that UID, so it's tough to compartmentalize applications
- State - programs need to be written to clean up after themselves because they live on for the next request
- Memory usage in general grows, and memory leaks can be a problem requiring occasional recycling of Apache children
- Bloated Apache children means speed reduction for static file serving, so running a separate lightweight webserver for static files is recommended
- Tied to Apache, so you're stuck with it - even just switching from Apache 1.3 to 2.0 required some rewriting
HTTP is already designed to pass requests from one machine to another.
Why not just use HTTP to connect the web server to the app?
Because most web app frameworks didn't have a built-in web server.
HTTP reverse proxying
Good idea! Benefits include:
- Already widely used
- Machine independence
- Easy to test and write health checks
- Load balancing
- Content compression
- One protocol to rule them all
- Lots of interoperable software on all platforms
What can turn my app into a standalone webserver?
The renaissance started with Python's WSGI (Web Server Gateway Interface), a "standardized interface between Web servers and Python Web frameworks/applications".
Now various HTTP servers implemented in Python had a uniform way to communicate with web frameworks.
Having a standard allowed many new compatible alternatives to bloom.
Runs WSGI apps in Apache in either embedded mode (like mod_python) or daemon mode (where work is done in separate child processes).
Widely used. Probably close to "the standard" for deploying Python web apps right now.
In the early day of Rails, FastCGI was really the only production-quality deployment option. Along came Mongrel, a high-performance web server to run Ruby apps.
- It was a big advance beyond FastCGI.
- No master process.
- Each worker was a separately-started app so there was almost no shared memory usage and it used a lot of memory.
- Each backend listened on it own TCP port, so you had to coordinate port ranges between your HTTP proxy and the Mongrel configuration every time you changed the number of children.
For a few years all was quiet, but others wanted to try their hand at it too.
Also known as mod_rails aka mod_rack.
- Runs inside either Apache or nginx
- Start as root, then forks separate master and worker processes set-UID appropriate to each application
- Configurable: can set maximum instances per app, idle child timeout, etc.
- Can restart each app individually without restarting web server
- Very popular, easy to get started with
- Upcoming version will support standalone operation too, accessible via HTTP proxy
% ps u # trimmed
PID VSZ RSS COMMAND
14889 153360 47356 Passenger ApplicationSpawner: /path/to/app
14897 182588 70924 Rails: /path/to/app
- A port of Mongrel to EventMachine and Rack.
- No master process; uses a cluster of independent child processes like Mongrel, so still need to define port ranges.
- Lighter on memory usage.
- Supports all Rack apps.
- Also based on Mongrel.
- Preforking daemon.
- Master listener distributes traffic, so no range of TCP ports necessary -- just one.
- Can preload app to share more memory between forked children.
Unicorn logo win
"This isn't really the official Unicorn mascot, but it should be."
An aside: Names with personality matter.
HTML::Mason vs. Mason
Starman vs. HTTP::Server::PSGI::Net::Server::PreFork
Port of Unicorn to Perl.
What can do general web service and proxying?
Wealth of free software options
- and others
Runs 5-6% of all websites, and is 4th most popular web server.
Very efficient with large number of clients -- not many processes or threads like Apache.
Easy to set up.
Lots of features.
- Spoonfeed slow clients
- Reverse proxying with optional simple load balancing
- HTTP keepalive
- Low demand on CPU, memory, and process resources
- No CGI support built in!
- Not the "standard", Apache, which everyone's used to
- No mod_php, but can run PHP via FastCGI
- Different rewrite syntax
- But easy to proxy to Apache when Apache is really needed
- Spoonfeed slow clients
- Reverse proxying with optional load balancing
- HTTP keepalive
- Highly configurable via Varnish Configuration Language (VCL)
- Nice introspection tools
- Low demand on CPU, memory, and process resources
- No gzip, though can cache gzip responses
- No SSL: use nginx, stunnel, etc.
- Only talks HTTP: no FastCGI etc.
- Not recommended on 32-bit operating systems
Reverse proxy considerations
- Have HTTP keepalive? No may be ok. Benchmark your pages!
- How to handle gzip/deflate? Storing both gzip and plain versions may be ok, but uses up more cache.
- Send X-Forwarded-For header and adjust upstream logging?
- Load balancing and failover?
- Consider logging verbosity
Reverse proxy caching
- Usually want to cache at least static content
- Set Cache-Control header
- Cache lifetime of 1s, 1 hour, 1 year?
- Serve cached files when cookies exist?
- Should you cache 301, 302, and 404 responses?
- Cache https at all?
- Unwrap https at the proxy?
- Is network to/from app server secure?
- Share cache pool for http and https?
- Indicate https to app server:
or nonstandard port and set on server side
Operating system considerations
- May need to increase ulimit -n in e.g.
/etc/sysconfig/varnish to increase number of file descriptors.
- Firewall state tables may fill up on very busy servers -- increase size in /proc/sys/net/ipv4/ip_conntrack_max and persist in
/etc/sysctl.conf or go stateless
- Tune TCP/IP network stack for many short connections: more ephemeral source ports, shorter FIN_WAIT state
- Filesystems tuned for heavy traffic: noatime, syslog async
Other deployment considerations
- 32- vs 64-bit: memory usage
- JVM: JRuby and Jython - but no native C code excluding many libraries
- Mongrel2: Zed Shaw's new effort to reinvent web app deployment - http://mongrel2.org/
The benefits are amply noted by marketing folks on the web.
By all means, give them a try!
Multi-tenant cloud drawbacks
- Vendor lock-in
- PCI compliance impossible?
- No filesystem writes
- Can't use native machine-level code
- Control over everything running on the (virtual) machine
- Choose your neighbors
- Control of security: firewalling, intrusion detection, SELinux, auditing
- Control of software versions: OS, libraries, web & app server, database
We have an embarrassment of riches in the open source web application world:
Perl, Python, and Ruby all have very nice modern frameworks for developing web applications.
They also have several equivalent solid options for deploying web applications.
There are several excellent web servers and reverse proxies.
Think at the system level
There are a lot of pieces that make up a web application. Consider the whole stack:
- client (browser, benchmark tools, manual HTTP)
- firewalls and network stack
- load balancers and reverse proxies
- app servers
- web app framework
- programming language
- session storage
If you're at least somewhat familiar with each component, you'll be better able to spend your energy wisely. Benchmark and test several candidates in your own environment before committing to one.
Check out webpagetest.org, an easy way to test real-world performance in Internet Explorer and share results.
Here are some reports of a utosc.com test:
Try something new!
A challenge: Try out one of the following microframeworks that you haven't used before. Install it on your local machine and put together a little test application.
- Perl: Dancer, Starman
- Python: Flask
- Ruby: Sinatra, Unicorn
Then set up nginx or Varnish to proxy it and serve the static content.
These slides presented in Slippy, an HTML presentation library.