Static Sites: Configuration and Optimization

pr-617-2

Today, we’ll be talking about hosting static sites in Cloud Storage. More specifically, we’ll be looking at how they can be set up and optimized.

From a user’s point of view, one of the most important criteria for a site is its load time. If a site takes too long to load for one reason or another, it will lose visitors who simply don’t want to wait. To increase a site’s speed, optimization must occur.

Below are some of our tips for optimizing static sites in Cloud Storage and decreasing load times.

Tip 1. Take Advantage of the CDN

Selectel’s Сloud Storage is connected to the Akamai content delivery network (CDN). The CDN saves all static content (images, text files, JS, CSS, etc.) to caching servers located all over the world (map).

When a website or its resources are accessed, the request is processed by the caching server closest to the user. With a CDN, sites load faster on both mobile and stationary devices.

The CDN caches data for 24 hours by default. In our cloud, we’ve recently added the ability to purge the CDN cache at any time:

cache

All you have to do is click “Purge CDN cache” and enter the address of the page you want to clear the cache for. The cache will be cleared about 15 minutes after the form is submitted.

Tip 2. Don’t Forget about Caching Settings

Web pages include lots of elements: images, scripts, style files (CSS), and so on. Users accessing a site for the first time receive these elements by sending a series of HTTP requests. To keep from downloading files over and over again, caching is implemented.

The basic caching model for HTTP uses special headers, called validators, to help clients determine whether or not a cached document is still relevant. Using validators, a client can check the status of a document without transferring the entire cached copy to the server. The server returns a document only if the validator it receives confirms that the client’s cached version is outdated.

Validators are divided into two categories: “strong” and “weak”. Strong validators first appeared in HTTP/1.1. They’re called “strong” because they change every time the file changes. ETags (entity tags) are usually considered “strong validators” and they identify a document’s contents; they change even when a document changes by just one bit. An example of this could be the MD5 checksum of a document’s contents. When a client requests a document from a server, the ETag value is returned in the response, for example:

HTTP/1.1 200 OK
Server: Selectel_Storage/1.0
Accept-Ranges: bytes
Last-Modified: Mon, 18 Aug 2014 12:25:38 GMT
X-Timestamp: 1408364738.80296
Content-Type: image/jpeg
Content-Length: 458073
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Last-Modified, ETag, X-Timestamp
ETag: "ebef3343a7b152ea7302eef75bea46c3"
Date: Wed, 20 Aug 2014 11:52:48 GMT

When that document is requested again, the value of the saved validator is sent in the If-None-Match header:

GET / HTTP/1.1
Host: example.org
If-None-Match:"ebef3343a7b152ea7302eef75bea46c3"

If the document has not changed, then the server will only return the headers and code “304 Not Modified”. Otherwise, the server will return a 200 code and send a new version of the document along with its new ETag value.

In our cloud, ETags are generated as the MD5 hash of a file’s contents as soon as it’s uploaded. If the content changes, the ETag changes.

Weak validators are those that don’t necessarily change every time a file changes.

An example of a weak validator would be the Last-Modified header. This header’s value is the date the file was last modified. In our storage, this is set automatically. If the If-Modified-Since header contains a date that is later than the Last-Modified header, then a 304 Not Modified code will be returned.

Strong validators can be used in any situation, whereas weak validators can only be used in situations that don’t depend on specific file content.

For example, both types of validators can be used in conditional GET requests (If-Modified-Since and If-None-Match). However, only a strong validator should be used if a site is loaded in segments, otherwise the client will receive an inconsistent file.

Tip 3. Pay Attention to the Cache-Control Header

To set the amount of time a file should be saved in a browser’s cache, the original file in our storage should contain a Cache-Control header with a max-age directive. This header can significantly reduce load times. If the file is cached, then the browser will instantly display content from the cache without sending a request to the site. A file’s caching period is shown in seconds:

Cache-Control: max-age=7200

In the given example, it’s set to 7200 seconds (2 hours). This is how CSS, JS, and image files are usually cached. These files could be cached indefinitely; if the content changes, just update the links in the HTML. In RFC 2616, the recommended caching time for these files should not exceed 1 year:

Cache-Control: max-age=31536000

If a file should never be cached and always be sent fresh, the following value should be set for the Cache-Control header:

Cache-Control: no-cache

This means the element should never be cached and the client must request it every time they access the cloud (in this case, the file’s load time will increase since the body must be downloaded each time).

Another way to ensure that the latest version of a file is always sent involves adding the content’s checksum to the file name.

If the content changes even one bit, then the checksum will also change. If nothing changes, then the browser will use the file from the cache. When a file changes, its URL will change and the latest version will be downloaded.

You can get a checksum using md5sum, sha1sum, or other more specialized utilities.

You can also add an arbitrary set of characters to URLs, like a timestamp (http://example.com/script.js?timestamp_here). Here, you just have to update links to the file every time the site is deployed. With this method though, there’s no guarantee that the browser won’t make any unnecessary requests: a different link will lead to files whose contents haven’t changed and they will be downloaded again (here, the whole URL, including query parameters, is the cache key).

For HTML pages, the preferred Cache-Control header value is no-cache. If you need to make an urgent change to a page, and the client has cached said page (modern browsers do this by default), the client may not see the changes at all.

This is especially important when using a CDN: the Akamai CDN caches files without relevant headers for 24 hours by default. Naturally you can purge the cache (see above), but it takes at least 15 minutes after the request has been sent. Setting a caching value helps avoid potential problems since the latest version of the page will always be downloaded. Browsers in this case will use the If-None-Match (or If-Modified-Since) header anyway, and the page which hasn’t been changed won’t be unnecessarily downloaded.

In some cases, it’s better to base the cache time of an HTML page on the frequency changes are made. For example, if a page contains news that updates every hour, then the max-age value should be set to 3600 (1 hour).

Cache-Control header values (just like other HTTP headers) in our cloud can be set from our web interface:

cache

Here, header values can only be set for entire containers. To set header values for individual files, you will have to use the API or external clients.

The Expires header can be used instead of Cache-Control. The value is given in RFC 1123 date and time format and indicates when the file is no longer relevant (for example: Tue, 31 Jan 2012 15:02:53 GMT). The browser will not make a request prior to that date, referring to the cached file instead. The file will only be downloaded again after the given date.

Tip 4. Use GZIP Compression

You can speed up a site’s load time by using a compressor. Since HTTP/1.1, clients can indicate supported compression methods in the Accept-Encoding header:

Accept-Encoding: gzip, deflate

The server responds with information on the compression methods used in the Content-Encoding header:

Content-Encoding: gzip

One of the most popular and most often used methods today is, of course, gzip. Using gzip, you can significantly decrease load times. Gzip is especially effective with text files: HTML, CSS, and JS. Compression decreases text file sizes (and the amount of traffic exchanged) about 5-10 times on average. This dramatically increases how quickly a site loads, which is especially pertinent for mobile clients with slow connections.

There’s no point in using gzip for graphics: compression doesn’t noticeably decrease their size, and more often than not, it increases it.

Gzip is used by default for the majority of text files in the Akamai CDN.

Tip 5. Minify JS and CSS

Minify means deleting excess/unnecessary characters from files in order to reduce their size and load times. File sizes decrease 1.5-3 times on average using this method. Today, it’s common practice to minify not only JS and CSS files, but various file types (HTML, images, etc.).

Special tools are used for minification, in particular:

Minification doesn’t only delete meaningless spaces and line breaks (optional in CSS and JS), it abbreviates complicated operations. For example, a JS that looks like

function summ(first_param, second_param) {
  return (first_param + second_param);
}

can become function s(a,b){return(a+b)} and s will be used everywhere in the code instead of summ, completely preserving the logic. You can see how JavaScript minification works on the site http://lisperator.net/uglifyjs/ in the Open Demo section.

Tip 6. Use Concatenation

Modern browsers make an average of 6 parallel requests to a domain. If a site contains many small files, it may lag, which is especially noticeable if you have a slow or unstable connection.

In these cases, concatenation can help by combining several same-type files (such as JS or CSS files) into one. This can lower the number of requests and at the same time increase how quickly pages load.

Concatenation can also be used to lower the load times of graphics. This can be done in two ways: by using sprites or by embedding data into the URL.

Data can be embedded with a particular kind of URL–data:URI. URIs (Universal Resource Identifier) can be used as an img src tag attribute, like a URL background image in CSS.

A number of online tools can be used to concatenate images to data:URI (see here and here for examples).

A sprite is a collection of several images saved in one picture. Different software can be used to create sprites. Using CSS, you can refer to the necessary part of the image and place it wherever you want on the site.

Sprites help reduce load times, but it’s worth noting that they often cause a number of complications. If you make even the slightest change to a sprite, relevant changes need to be made to the CSS.

Minification and concatenation can be automated using modern JS build tools (Brunch, Grunt, Gulp and others). Executing all file operations with one command (including full deployment onto a server) can be done by creating a small configuration file describing the compilation order and settings.

If You’d Like to Learn More

The particulars for developing and configuring static sites is ever expanding. We will of course revisit this subject in future publications. For those of you who would like to delve deeper into the theory and practice of this topic, please visit the following links:

We’d be happy to answer all of your questions and comments below. Keep your eyes peeled for more articles about current trends in static site development.