Pelican and nginx’s static gzip feature

Recently I have set up nginx as a web server for this blog. It’s been working great so far but there is a way I can make it a bit more efficient. For example by decreasing the amount of data a client needs to download to view the website.

Why compress?

Part of the web browsing experience is waiting for a website to load. The time this takes depends greatly on the speed of the Internet connection being used and the amount of data that needs to be transferred in total. There are other factors of course, but most of those we cannot influence from our end: the server side.

So what can we do?

A method used to reduce the amount of bytes sent down the wire is HTTP compression. We compress the files before sending them out to a client. Usually this is done “on-the-fly” by using gzip or (a variant of) Deflate algorithms. This requires support on the client side, specifically their browser. Luckily most browsers support this nowadays, with one notable exception being Internet Explorer 6. But don’t worry, you will not break compatibility with it if you include a statement to disable using compression for such clients. And example can be found in nginx wiki - HttpGzipModule#gzip_disable.

Compressing data on-the-fly does use some CPU cycles, but modern CPUs are powerful enough to cope with the (little) extra load this generates. On the client side the decompression is quite a bit less intensive, so no worries there. But why should we compress our static files on the fly? It makes perfect sense for dynamic pages, generated with PHP for example, because those are… well… dynamic. Meaning we cannot store them pre-compressed to send out to a user. Something we can do for static files, such as the HTML generated by Pelican. nginx supports using pre-compressed files instead of compressing our static files on the fly. And mblayman wrote the gzip_cache plugin for Pelican so it can create gzip-ed copies of the output.

Configuring Pelican to output gzip-ed copies

The plugin required for this has been included in the Pelican base, so no need to install something.

UPDATE: Pelican plugins have been separated from the Pelican base to a Github repository containing all of them. In order to use them, you just have to checkout the plugins from there using git, preferably somewhere relative to your Pelican configuration files, where your content and output directories will probably also reside. In my case:

$ cd ~/proj/blog
$ git clone https://github.com/getpelican/pelican-plugins

Now you have to do is tell Pelican where the plugins can be found and which plugin to use. When informing Pelican about the plugins’ path you can use a relative path if you keep the plugins in a directory below the directory of your configuration file. I added this to my Pelican configuration file:

### Plugins 
PLUGIN_PATH = 'pelican-plugins'
PLUGINS = ['gzip_cache',]

Save the file and let’s ask Pelican to generate new output. And have a look at your output directory:

(blog)stefan@aether:~/proj/blog$ pelican -s ~/proj/blog/pelicanconf.py ~/proj/blog/content/
(blog)stefan@bacchus:~/proj/blog$ ls output/
archives.html     categories.html.gz   git-blog-setup.html.gz  index2.html.gz    nginx-setup.html.gz    qjail-setup.html     tag
archives.html.gz  category             hello-world.html        index.html        pages                  qjail-setup.html.gz  tags.html
author            feeds                hello-world.html.gz     index.html.gz     pelican-setup.html     robots.txt           tags.html.gz
categories.html   git-blog-setup.html  index2.html             nginx-setup.html  pelican-setup.html.gz  robots.txt.gz        theme

Notice how Pelican created .gz versions of the .html files? They are the compressed equivalents of those .html files. If these show up, it’s time to configure nginx to use existing .gz variants if they are available.

Configure nginx with gzip_static

Now we have our files standing by, we have to tell nginx to make use of them. This is accomplished by enabling the gzip_static module in the nginx configuration. If you used my configuration from Serving a blog with nginx, you can simply uncomment it. And I also recommend you change the gzip_min_length setting to 150. If you wrote your own, I recommend you at least have a look at it to view some of the related options.

gzip_static       on;
gzip_min_length   150;      # 150 bytes is a good starting range when using pre-compressed files.
#gzip_disable "MSIE [1-6]\.(?!.*SV1)";  # Do not use compression for <IE6 SV1 clients

Save your file and restart nginx:

satyr /root > /usr/local/etc/rc.d/nginx restart

There, all done!

Testing

How can we be sure nginx is serving out our pre-compressed files instead of compressing our files on-the-fly? There are multiple ways to test it.

  1. Set gzip off in the config to disable “on-the-fly” compression, leaving gzip_static on. And check if your server is still serving compressed data. A lazy way of checking you are serving compressed data is to run Google PageSpeed Insights, enter your URL and have a look at the results. If they recommend enabling compression, it’s not working. ;-)
  2. Or (leaving gzip on; enable access_log and add $gzip_ratio to log_format. Generate some GET events by browsing your website and check the log. Remember to remove any buffer setting on access_log so you can see the result immediatly. Is the $gzip_ratio equal to 0.000 and you are receiving compressed replies? Hurray!
  3. Use strace to see what files nginx is accessing. See: stackoverflow - How can I check that the nginx gzip_static module is working?.

blogroll

social