Caching upstream web assets forever with Nginx

I'm going to describe everything I had to do to get Nginx to cache images forever(or at least for one year).

Before I start there is one thing you should know: Everything that I point out here is already documented in the official Caching Guide for Nginx, but I was lazy and didn't read it the whay I should I hope I can save you some of your precious time.

If you're already running Nginx in production and have some decent traffic you can probably guess when a response is coming from your upstream server and not from Nginx cache based on how much time it takes to process a request, if it's slow then we know it as a MISS; on the other hand if Nginx finds the response in the cache and replies with it, that's a HIT.

To avoid any guessing it's better to just have Nginx tell you when it hits or misses cache requests, we do that by provisioning a HTTP header with the status of the cache status. You can pick any name you want, I like X-Cache-Status just like the official guide suggests:

location / {
  # . . . 
  add_header X-Cache-Status $upstream_cache_status;
  # . . .
}

We can now issue requests to our server and see the what really happen with the cache for each particular request:

$ curl -v localhost:8080
. . . 
>
< HTTP/1.1 200 OK
< Server: nginx/1.15.7
< Date: Fri, 20 Dec 2018 00:46:43 GMT
< Content-Type: image/png
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: public, max-age=31536000
< X-Cache-Status: HIT

Moving on to the first snippet of the guide:

proxy_cache_path /path/to/cache levels=1:2 keys_zone=my_cache:10m max_size=10g 
                 inactive=60m use_temp_path=off;

server {
    # ...
    location / {
        proxy_cache my_cache;
        proxy_pass http://my_upstream;
    }
}

This configuration may or may not work depending on your setup, a few things to consider:

Cookies in the header

If Nginx sees a Set-Cookie header in the upstream response, it won't cache.

I didn't have this problem, but if you want to make sure a MISS it's not about cookies you can just simply ignore all cookies:

location / {
  # . . . 
  proxy_ignore_headers "Set-Cookie";
  # . . . 
}

Cache-Control header

If Nginx sees a Cache-Control header with Private, No-Cache, or No-Store, it won't cache.

I didn't have this problem either, but if you want to make sure it's not about Cache-Control leaking in your upstream responses with you can also ignore this header via proxy_ignore_headers, for example, the following configuration from the official guide ignores cache control from upstream and force caching of ANY status code(not just 200 OK) to be cached for 30 minutes(keep reading):

location / {
    proxy_ignore_headers Cache-Control;
    proxy_cache_valid any 30m;
  }

This doesn't sound like a good idea, Instead, you probably want instead to only cache responses 200 OK like this:

location / {
    proxy_ignore_headers Cache-Control;
    proxy_cache_valid 200 30m;
  }

If you're at it, it may be a good idea to ignore Expires header too:

proxy_ignore_headers Cache-Control Expires;

Proxy Buffering

If Nginx has proxy_buffering: off, it won't cache.

It's on by default so I'm not sure why it'd be off, but just in case:

proxy_buffering on;

Here's a good explanation why it should be on and why it affects caching.

Tweaking Nginx to cache for longer

At this point this Nginx configuration was caching resources for 30 minutes, but how do we make it cache for longer?

My first move was to change proxy_cache_valid any 30m; to say 1y:

proxy_cache_valid any 1y;

Good, with Nginx reloaded I issue a new request after 30 minutes I got:

$ curl -v localhost:8080
. . . 
>
< HTTP/1.1 200 OK
< Server: nginx/1.15.7
< Content-Type: image/png
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: public, max-age=31536000
< X-Cache-Status: HIT

So X-Cache-Status: HIT, that's good.

How about after 1 hour?

$ curl -v localhost:8080
. . . 
>
< HTTP/1.1 200 OK
< Server: nginx/1.15.7
< Content-Type: image/png
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: public, max-age=31536000
< X-Cache-Status: MISS

We get X-Cache-Status: MISS, obviously not what we want, it should have been a HIT.

Turns out proxy_cache_valid instructs Nginx that the resource could be cached for 1y IF the resource doesn't become inactive first. When you request a resource that has longer expiration but has become inactive due lack of requests, it causes a cache miss.

proxy_cache_path at http level specifies how long an item can remain in the cache without being accessed, so in order to cache resources for 1 year the configuration should be:

proxy_cache_path . . .  
                 inactive=1y
                 . . . ;

server {
    location / {
        # . . .
        proxy_cache_valid 200 1y;
        # . . . 
    }
}

Conclusion

proxy_cache_path should have a higher inactive time than the Expiration time of the requests (proxy_cache_valid).