Monday, 25 June 2012

Using the CachingHandler in ASP.NET Web API


Introduction


[NOTE: Please also see updated post and the new framework called CacheCow
However, WebApiContrib code that appears below will be maintained and supported. Please send the bugs, issues to WebApiContrib project on GitHub or contact me directly]


[Level T3] Caching is an important concept in HTTP and comprises a sizeable chunk of the spec. ASP.NET Web API exposes full goodness of the HTTP spec and caching can be implemented as a message handler explained in Part 6 and 7. I have implemented a CachingHandler and contributed the code to the WebApiContrib in GitHub.

This post will have two sections: first a primer on HTTP caching and then how to use the handler. This code uses ASP.NET Web API RC and with .NET 4 (VS 2010 or 2012).

Background

NOTE: This topic is fairly advanced and complex. You do not necessarily need to know all this in order to use CachingHandler and are welcomed to skip it but more in-depth knowledge of HTTP could go a long way. 

Caching is a very important feature in the HTTP. RFC 2616 extensively covers this topic in section 13. Review of all the spec is beyond the scope of this post but we will briefly touch on the subject.

First of all let's get this straight: this is not about putting an object in memory as we do with HttpRuntime.Cache. In fact server does not store anything (more on this below), it only tells the client (or mid-stream cache or proxy servers) what can be cached, how long and validates the caching.

Basically, HTTP provides semantics and mechanism for the origin server, client and mid-stream proxy/cache servers to effectively reduce traffic by validating the version of the resource they have against the server and retrieve the resource only if it has changed. This process is usually referred to as cache validation.

In HTTP 1.0, server would return a LastModified header with the resource. A client/user agent could use this value and send it in the If-Modified-Since header in the subsequent GET requests. If the resource was not changed, server would respond with a 304 (Not Modified) otherwise the resource would be sent back with a new LastModified header. This would be also useful in PUT scenarios where a client sends a PUT request to update a resource only if it has not changed: a If-Unmodified-Since header is sent. If the resource has not changed, server fulfils the request and sends a 2xx response (usually 202 Accepted) otherwise a 412 (Precondition Failed) is sent.

For many reasons (including the fact that HTTP dates do not have milliseconds) it was felt that this mechanism was not adequate. In HTTP 1.1, ETag (Entity Tag) was introduced which is an opaque Id in the format of a quoted string that is returned with the resource. ETags can be strong (refer to RFC 2616 for more info) or weak in which case they start with w/ such as w/"12345". ETag works like a version ID for a resource - if two ETags (for the same resource) are equal, it means those two versions of the resource are the same.

ETag for the same resource could be different according to various headers. For example a client can send an Accept-Language header of de-DE or en-GB and server would send different ETags. Server can define this variability for each resource by using Vary header.

Cache validation for GET and PUT requests are similar but instead will use If-Match (for PUT) and If-None-Match (for GET).

Well... complex? Yeah, pretty much and yet I have not covered some other aspects and the edge cases. But don't worry! You will be abstracted from a lot of it if you use the CachingHandler.

Using CachingHandler right out of the box

OK, using the CachingHandler is straightforward especially if you go with the default settings. You can have a look at the CarManager sample in the WebApiContrib project. CarManager.Web sample demonstrate server side setup of the caching and CarManager.CachingClient for making calls to the server and testing various caching scenarios.

All you have to do is to create an ASP.NET Web API project and add the CachingHandler as a delegating handler as we learnt in Part 6:

GlobalConfiguration.Configuration.MessageHandlers.Add(cachingHandler);

And you are done! Now let's try the server with some scenarios. I would suggest that you use the CarManager sample, otherwise create a simple handler and implement GET, PUT and POST on it.

Now let's make some request and look at the response. I would suggest using Fiddler or Chrome's Postman to send requests and view the response.

So if we send a GET request:

GET http://localhost:8031/api/Cars HTTP/1.1
User-Agent: Fiddler
Host: localhost:8031

We get back this response (or similar; some headers removed and body truncated for clarity):

HTTP/1.1 200 OK
ETag: "54e9a75f2dbb4edca672f7a2c4a73dca"
Vary: Accept
Cache-Control: no-transform, must-revalidate, max-age=604800, private
Last-Modified: Thu, 21 Jun 2012 23:35:46 GMT
Content-Type: application/json; charset=utf-8

[{"Id":1,"Make":"Vauxhall","Model":"Astra","BuildYear":1997,"Price":175.0....

So we see the ETag header here along with important caching headers. Now if we make another GET call, we get back the same response but ETag and last modified stays the same.

Generating same ETag is fine (showing our CachingHandler is doing something) but we have not yet seen any caching. That is where client has to do some work to do. It has to use If-None-Match with the ETag to conditionally ask for the resource: if it matches server, it will get back 304 but if not, server will return the new resource:
GET http://localhost:8031/api/Cars HTTP/1.1
User-Agent: Fiddler
Host: localhost:8031
If-None-Match: "54e9a75f2dbb4edca672f7a2c4a73dca"
Here we get back 304 (Not modified) as expected:
HTTP/1.1 304 Not Modified
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
ETag: "54e9a75f2dbb4edca672f7a2c4a73dca"
Server: Microsoft-IIS/8.0
Date: Sun, 24 Jun 2012 07:34:29 GMT

A typical server that can use CachingHandler

CachingHandler makes a few RESTful assumptions about the server for effective caching. Some of these assumptions can be overridden but generally not recommended. These assumptions are:

  • HTTP verbs (POST/GET/PUT/DELETE for CRUD operations) are used to modify resources - and not RPC-style resources (such as POST /api/Cars/Add). This is the most fundamental assumption.
  • All resources are to be modified through the same HTTP pipeline that implements caching. If a resource modified outside the pipeline, cache state needs to be updated by the same process.
  • Resource are organised in a natural cache-friendly manner: invalidation of related resources can be done with a minimal setup (more details upcoming).

Defining cache state

As we said, this has nothing to do with HttpRuntime.Cache! Unfortunately ASP.NET implementation makes it really confusing between the HTTP cache (where the resource gets cached on the client or midstream cache servers) and server caching (when the rendered output gets cached on the server).

Cache state is a collection of data that helps keep track of each resource along with its last modified date and ETag. It might initially seem that for a resource, there exists a single of such pieces of information. But as we touched upon above, a resource can have different representations each of which needs to be stored separately on the client while they will most likely invalidated together. 

For example, resource /api/disclaimer can exist in multiple languages as such client has to cache each representation separately but when disclaimer changes, all such representations need to be invalidated. That will require a storage of some sort to keep track of all this data. Current implementation comes with an in-memory storage but in a web farm scenario this needs to be a persistent store.

Cache state storage and cache invalidation

So we do not need to store the cache on the server, but we DO need to store ETag and various states on the server. If we only have a single server, this state can be stored in the memory. In a web farm (or even web garden) scenario a persisted store is needed. This store is called Entity Tag Store and is represented by a simple interface:

public interface IEntityTagStore
{
 bool TryGetValue(EntityTagKey key, out TimedEntityTagHeaderValue eTag);
 void AddOrUpdate(EntityTagKey key, TimedEntityTagHeaderValue eTag);
 bool TryRemove(EntityTagKey key);
 int RemoveAllByRoutePattern(string routePattern);
 void Clear();
}

We need to have an implementation of this interface so that the cache management can be abstracted away from controllers and instead done in the DelegatingHandlers.

Introducing some concepts (you may skip and come back to it later)

This store will keep track of the Etag (and related state stored in TimedEntityTagHeaderValue) based on an Entity Tag key which is calculated based on the resource URI and content of the important headers (which their list will be on the Vary header). In-Memory implementation for a single server is provided out of the box with CachingHandler - I would be creating a SQL-Server implementation of it very soon; watch this space.

It is important to note that change in the resource will most likely invalidate all forms of the resource so all permutations of important headers will be invalidated. So invalidation is usually performed at the resource level. Sometimes several related resources can be represented as a RoutePattern. By default, URI of a resource is its RoutePattern.

Also in some cases, change in a resource will invalidate linked resources. For example, a POST to /api/cars to add a car will invalidate /api/cars/fastest and /api/cars/mostExpensive. In this case, "/api/cars/*" can be defined as the linked RoutePattern (since /api/cars does not qualify for /api/cars/*).

Some assumptions in CachingHandler

  1. Re-emphasising that resource can only change through the HTTP API (using PUT, POST and DELETE verbs). If resources are to be changed outside the API, it is responsibility of the application to use IEntityTageStore to invalidate the cache for those resources.
  2. If no Vary header is defined by the application, CachingHandler creates weak ETags.
  3. Change in the resource will invalidates all forms of the resources (all permutations of important header values)

CarManager sample

I have considered a pretty complex and interrelated routing and caching requirement for the sample to display what is possible. Resources available are:

  1. /api/Car/{id}: GET, PUT and DELETE
  2. /api/Cars: GET and POST
  3. /api/Cars/MostExpensive: GET
  4. /api/Cars/Fastest: GET
So creating a car would invalidate cache for 2, 3 and 4. Updating car with id=1 invalidates 2,3 and 4 in addition to the car itself at /api/Car/1. Deleting a car will invalidate 2, 3 and 4.

The client sample (CarManager.CachingClinet) displays how to call the server with headers to validate the cache.

Configuring CachingHandler

Best place to start is the CarManager.Web sample to give an idea on how various setup can be used to configure complex caching requirements. Basically the points below can be used to configure the CachingHandler

Constructor

A list of request header names can be passed that will define the vary headers. If no vary header is passed, system only generates weak ETags. Also optionally an implementation of IEntityTagStore otherwise by default InMemoryEntityTagStore will be used.


EntityTagKeyGenerator

This is an opportunity to provide linked resource for a resource.

 Other customisation points

CachingHandler provides properties in the form of functions with default implementation that can be changed  to override the default behaviour which we will cover in the upcoming post on "Extending CachingHandler".


Conclusion

CachingHandler is a server-side DelegatingHandler which can be used to abstract away caching from individual ApiControllers so that controller logic can be coded without having to worry about caching. 

The  code is hosted on GitHub (currently sitting at my fork waiting to be merged merged!) and comes with both a client and server sample - CarManager.Web and CarManager.CachingClient.

10 comments:

  1. Thanks very much for the post. Very helpful and will make life simpler person working with Asp.Net web api.

    However one thing i am worried about. After caching my OData query support has gone and it simply ignores the $top=3 or $skip=3 in querystring.

    ReplyDelete
    Replies
    1. Thanks Sachin. Have you got a repro or a bit more information? So you mean you had a query "/api/foo" and called once and cached so now you call "/api/foo?$top=3&$skip=3" and you get the 304 "NotModified"?

      I will have a look anyway.

      Delete
    2. Ya, correct. I am getting a 304 "NotModified" everytime. Url /api/foo?$top=3&$skip=3 returns the default page result as of /api/foo

      Delete
    3. Are you using CarManager sample? Since that one does not have an OData provider on any of the actions.

      If not, I have tried and could not reproduce it. Please send a repro in a GitHub Gist or send me the code to aliostad//at//gmail//dot//com.

      Delete
    4. Hi. aliostad's question is interesting for me. How would caching handler works with odata query parameters?

      Delete
    5. First of all, best is to use CacheCow as is a full-blown project. Also OData queries contain the consditions on the URL so caching would work appropriately as URL is part of the Cache Key.

      Delete
    6. CacheCow.Server 0.5.0-alpha has dependency on "CacheCow.Common (≥ 0.5.0)", but it is not available and installation of CacheCow.Server 0.5.0-alpha is failing.

      Delete
  2. Hi i am not able to open the sample. it gives "Page not found error"

    ReplyDelete
  3. Hi,

    The link to the "Update Post" (http://byterot.blogspot.co.uk/2012/07/cachecow-http-caching-framework-server.html) in first paragraph in breaking. Can you please provide the correct link?

    ReplyDelete