Sunday, 13 October 2013

CacheCow update - Moving to MemoryCache for in-memory stores on both Client and Server

As you might know, CacheCow is a framework that implements HTTP Caching for ASP.NET Web API both for the client and server. If you are familiar with it, feel free to jump to the section "What's the update?".

HTTP Caching defined by HTTP Specification (currently at version 1.1 according to RFC 2616 although the work HTTP 2.0 is very close to finish) is an important feature of HTTP resulting in scalability of web as we know it.

It is important to remember that resources are cached on the client, and not on the server. In ASP.NET we can use Output Caching and System.Web.Caching.Cache to cache items on the Server but HTTP caching is different in the fact that resources get stored on the client. A resource is

  • either not cacheable (identified by no-store value in the Cache-Control header)
  • or can be only cached by the client (identified by private value in the Cache-Control header)
  • or can be cached by the client and all intermediaries (identified by public value in the Cache-Control header)
Currently CacheCow.Client looks after the storage of the items in an implementation of ICacheSore interface. Currently these implementations exist:
  • In-memory
  • Memcached
  • Redis
  • SQL Server
  • File-based
On the other hand, server also needs to store cache metadata. This is normally information such as Last-Modified and ETag of the resources. You configure CacheCow.Server to use one of several implementations of IEntityTagStore on the server. Currently these implementations exist:
  • In-memory
  • Memcached
  • RavenDB
  • MongoDB
  • SQL Server

So what is the change?

In the latest release of CacheCow which stands at 0.4.12, in-memory implementation of ICacheStore and IEntityTagStore have been changed from ConcurrentDictionary<TKey, TValue> based to MemoryCache based. The problem with dictionary-based implementation is that the store just grows and the items will never be freed. MemoryCache, on the other hand, is designed to be able to keep its memory usage to a threshold and expel old or least frequently used items out of the cache. 

To use in-memory stores, all you have to do is to use the CachingHandler with default constructor which results in-memory stores to be used, as they are default. The good thing with these implementations are ability to set a maximum memory limit. This is achieved using app.config to web.config of your application:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.runtime.caching>
    <memoryCache>
      <namedCaches>
        <add name="TheName" cacheMemoryLimitMegabytes="40" pollingInterval="00:05:00" />
      </namedCaches>
    </memoryCache>
  </system.runtime.caching>
</configuration>

Above configuration will set the memory usage limit to 40MB and this limit will be checked every 5 minutes. CacheCow uses these names for its MemoryCache objects:

  • CacheCow.Client: "###InMemoryCacheStore_###"
  • CacheCow.Server uses 2 MemoryCache objects:
    • Storing ETag and LastModified: "###_InMemoryEntityTagStore_ETag_###"
    • Storing route patterns: "###_InMemoryEntityTagStore_RoutePattern_###"

You can use the configuration to limit the memory used by these MemoryCache objects. Since MemoryCache is in-memory and is in the same AppDomain as the application, it provides the fastest and most efficient storage.

Use in-memory storage wherever you can and surely it provides better performance compared to the likes of SQL Server.
The only caveat with using these implementations is that even though you may set a memory limit, memory can increase above your threshold. This is not an issue and is related to garbage collection and using GC.Collect() the memory will return back to the actual usage.

7 comments:

  1. I was going through the code. What is the rationale for generating an etag only for GET and PUT in the ? Returning an entity upon a POST with an autogenerated id is a legitimate scenario. It would be good to have an etag generated for that.

    Nice work, by the way.

    Thanks.

    ReplyDelete
    Replies
    1. Thanks!

      On the POST, what is the use case of ETag for a POST request? It is not safe nor idempotent.

      Delete
  2. This is my scnario:

    If you were to post a new resource to a collection, e.g.

    POST /api/things
    {"name":"thing 1"}

    The server responds with:
    200 OK
    {
    "id": 1
    "name":"thing 1"
    }

    Given that, it makes sense that an etag should accompany the response so that the client could subsequently perform a conditional GET or PUT based upon the result of the POST. The lack of idempotency is fine since a subsequent request such as

    POST /api/things
    {"name":"thing 1"}

    Would result in a different entity (differentiated by a unique id) therefore a different etag
    200 OK
    {
    "id": 2
    "name":"thing 1"
    }

    There doesn't seem to be any reason to think that when adding an item to a collection that item can't immediately be put into the cache. And it seems to me that a REST api can reflect that accurately by returning an etag for a newly posted item. If there are additional uniqueness constraints imposed by the domain a non-2xx status would be adequate communication back to the client.

    Is that flawed thinking?

    ReplyDelete
    Replies
    1. I am afraid it is. While your performance greedy approach with POST also returning the item is appreciated, this is not correct since the answer to POST normally does not have a content. So if synchronous you send back 204 (no content) or if async then send back 202 (accepted).

      The best approach is to return URL to the resource in the Location header so another GET is done by the client and that will contain the resource and cache headers.

      Delete
    2. Something I did not explain well was why POST returns no content. Since POST response must be the response to the POST request not POST+GET. If you return the new created object, you mixed POST and GET.

      Delete
  3. Ok. It definitely is a performance consideration. Specifically with a mobile app where it is better to conserve bandwidth. I believe atompub handles POSTs similarly to how I describe above. Since it is often pointed to as an exemplar of a restful service I'll probably continue down this path. Any problem with me forking the project and making modifications to suit our use cases?

    Thanks for the nice discussion.

    ReplyDelete
    Replies
    1. By all means. That is why it is open source.
      Having the content in the POST response not necessarily a problem but that response at least in the AtomPub does not have cache header.


      Delete