Friday, 24 August 2012

Server-side TPL Async: Don't risk learning these lessons the hard way

[Level T2]

There has been more than a few times that I have felt I know all about TPL. Only to realise sometime later I was wrong, very wrong. Now you might read this and say to yourself "Come on, this is basic stuff. I know it well, thank you vety much". Well, it is possible that you could be right but I advise you carry on reading; what follows can surprise you.

None of what I am gonna talk about is new or it is being blogged about for the first time. Brad Wilson has an excellent series on the topic here but this is to serve as a digest of his posts targeted at a broader audience in addition to a few other points.

While this post is not directly related to ASP.NET Web API, most examples (and cases) are related to day-to-day scenarios we encounter in ASP.NET Web API.

Remember this covers pre-async/await keywords in .NET 4.5 and what you need to do if you are using .NET 4.0 and not async/await. Using async/await will cover you for some of the problems described below but not all.

Don't fire and forget

Tasks are ideal for decoupling pieces of functionality. For example, I can perform a database operation and at the same time audit the operation by outputing a log entry, writing to a file, etc. Using tasks I can de-couple these operations so that my database task returns without having to wait for audit to finish. This makes sense since database operation is high priority but audit is low priority:

private void DoDbStuff()
{
   CallDatabase();
   // doing audit entry asynchronously not to bog down database operation
   Task.Factory.StartNew(()=> AuditEntry("Database stuff was done"));
}

In fact, let's say we do not even care if audit is successful or not so we just fire and forget, it most audit will fail which is low priority. OK, it all seems innocent?

No! This innocent operation can bring down your application. Reason for it is that all async exceptions must be observed even if you do not care about them. If you don't, they will haunt you when you least expect them, at the time finalizer for task is run by GC. Such an unhandled exception will kill your app.

The link above talks about various ways of observing an exception. The most practical is to use a continuation and access the .Exception property of the task (just accessing the property is enough, does not need to do anything with the exception itself).

private void DoDbStuff()
{
   CallDatabase();
   // doing audit entry asynchronously not to bog down database operation
   Task.Factory.StartNew(()=> AuditEntry("Database stuff was done"))
      .ContinueWith(t => t.Exception); // fire and forget!
}

Another option which is more of a safe-guard against accidental unobserved exception, is to register to UnobservedTaskException on TaskScheduler:

 TaskScheduler.UnobservedTaskException +=
  (e, sender) => LogException(e);

So we register a handler to handle unobserved exceptions and this way they will be "observed". If you need to read more on this, have a look at Jon Skeet's post here.

This problem has made Ayende Rahien to run for the hills.

Respect SynchronizationContext

Uncle Jeffrey Richter tells us that

By default, the CLR automatically causes the first thread's execution context to flow to any helper threads.

And then we also learn that we can use ExecutionContext.SuppressFlow() to suppress flow of the thread context.

Now, what happens when we use ContinueWith()? It turns out unlike standard thread switches, context does not flow (I do not have a reference, if you do please let me know). This will help with improving performance of asynchronous task as we know context switching is expensive (and big part of it is context flow).

So why is it important? It is important because so many developers are used to HttpContext.Current. This context is stored in the thread storage area and passed along at the time of context switching. So if the context does not flow, HttpContext.Current will be null.

SynchronizationContext is a similar (but not same) concept. It is about a state that can be shared and used by different threads at the time of switching. I cannot explain this better than Stephen here. So using Post on SynchronizationContext ensures that the execution of continuation will happen in the same context and not necessarily by the same thread.

So basically the idea is that if you are in a Task pipeline (best example being MessageHandlers in ASP.NET Web API), you need to take responsibility for passing the context along the pipeline.

This is a snippet from ASP.NET Web API Source code that displays the steps. First of all you check to see if current context is null, if it is not then you have to use Post() to flow the context:

SynchronizationContext syncContext = SynchronizationContext.Current;

    TaskCompletionSource<Task<TOuterResult>> tcs = new TaskCompletionSource<Task<TOuterResult>>();

    task.ContinueWith(innerTask =>
    {
        if (innerTask.IsFaulted)
        {
            tcs.TrySetException(innerTask.Exception.InnerExceptions);
        }
        else if (innerTask.IsCanceled || cancellationToken.IsCancellationRequested)
        {
            tcs.TrySetCanceled();
        }
        else
        {
            if (syncContext != null)
            {
                syncContext.Post(state =>
                {
                    try
                    {
                        tcs.TrySetResult(continuation(task));
                    }
                    catch (Exception ex)
                    {
                        tcs.TrySetException(ex);
                    }
                }, state: null);
            }
            else
            {
                tcs.TrySetResult(continuation(task));
            }
        }
    }, runSynchronously ? TaskContinuationOptions.ExecuteSynchronously : TaskContinuationOptions.None);

    return tcs.Task.FastUnwrap();

There is a horrifying fact here. Most of the DelegatingHandler code out there (including some of mine) in various samples around internet do not respect this. Of course, looking at ASP.NET Web API source code reveals that they do indeed take care of this in their TaskHelper implementations and Brad tried to make us aware of it in his blog series. But I think we have not taken enough attention of the implications of ignoring SynchronizationContext.

Now my suggestion is to use the TaskHelpers and its extensions in the ASP.NET Web API (it is open source) or use the one provided in Brad's post. In any case,

Don't use Task for CPU-bound operations

Overhead of asynchronous operations is not negligible. You should only use async if you are doing an IO-bound operation (calling another web service/API, reading a file, reading a lot of data from database or running a slow query). I personally think even for normal IO operations, sync is more performant and scalable.

As we have talked about it here, the point about asynchronous programming on server-side is releasing the thread to be able to serve another request. Tasks are normally served by the CLR thread pool. If server already needs managed threads for its operations, it will be using CLR thread pool too. This means that by doing async operations you could be stealing threads needed for server's normal operations. A classic example is ASP.NET, so you should be careful to use async only if needed.

ContinueWith is Evil!

I think by now you should know why standard ContinueWith can be evil. First of all, it does not flow the context. Also it makes it easy for unboserved exceptions to creep into your code. My suggestion is to use .Then() from ASP.NET Web API's TaskHelpers.

Performance comparison

I think it is still early days - but I must say I would love to do a benchmark to quantify overhead of server-side asynchronous programming. Well if I do, this place will be where the result will first appear :)

So. Do I think I know all about TPL now? Hardly!

Monday, 6 August 2012

CacheCow.Client, using the benefits of HTTP Caching on the client

[Level T2]

Browsers are very sophisticated HTTP machines. We often fail to remember how much of the HTTP spec is implemented by the browsers.

As I have said before, ASP.NET Web API is a very powerful server-side framework but there is a client-side burden in using it or generally implementing a RESTful system - although Web API does not restrict you to a RESTful style.

Because of the client burden, we need more and more client-side libraries to implement lacking features that browser have had for such a long time - one of which is HTTP caching. If you use HttpClient out of the box, it will not implement any caching even though the resources are cacheable. Also all of the work for conditional GET or PUT calls (using if-none-match, etc) or cache validation (if there is must-revalidate) or checking whether your cache is stale has to be done in your own code.

CacheCow is an HTTP caching library for client and server in ASP.NET Web API that does all of above - see my earlier post on that. Storage of the cache is abstracted in ICacheStore and for now we can use in memory implementation (see below). So the features in the client library include:

  • Caching GET responses according to their caching headers
  • Verifying cached items for their staleness
  • Validating cached items if must-revalidate parameter of Cache-Control header is set to true. It will use ETag or Expires whichever exists
  • Making conditional PUT for resources that are cached based on their ETag or expires header, whichever exists

Today I released v0.1.3 of the CacheCow.Client on NuGet. This library would implement advanced HTTP caching with little or no configuration or hassle. All you have to do is to add the CachingHandler as a delegating handler to your HttpClient:

var client = new HttpClient(new DelegatingHandler()
       { 
           InnerHandler = new HttpClientHandler()
       });

This code will create an HttpClient that implements caching and stores the cache in memory. By implementing ICacheStore, you can store the cache in your custom repository. CacheCow is going to have persistent cache stores such as FileCacheStore, SqlCeCacheStore and SqliteCacheStore as a minimum. FileCacheStore will be similar to browser implementation of cache storage. Each of these cache stores will be implemented and released under its own NuGet package. To add an alternative cache store, you need to pass the store as a constructor parameter.

Usage

So in order to use, CacheCow.Client, use package manager in Visual Studio to download and add reference to it:

PM> Install-Package CacheCow.Client

This will also download and add reference to ASP.NET Web API client package, if you have not already added a reference to. Make sure try v0.1.3 or above (by the time of reading this).

After this you just need to create an HttpClient as above and add the CachingHandler as a delegating handler. That's it, you are ready to call services and cache the responses!

Sample

I am working on a sample project but for now, it is easiest to use the code below to call my CarManager Azure website which implements HTTP Caching. The code can be pasted from this GitHub gist.

CacheCow.Client adds a special header to the response which helps with debugging its various features. The header's name is x-cachecow and has a various flags on the operations done on the request/response. So in the code below, we will use this header to demonstrate the features of this library.

var client = new HttpClient(new CachingHandler()
                    {
                        InnerHandler = new HttpClientHandler()
                    }
 );
var initialResponse = client.GetAsync(
      "http://carmanager.azurewebsites.net/api/Car/5").Result;
var initialResponseHeader = initialResponse.Headers.Single(
       x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(initialResponse.Headers.ETag.Tag);
Console.WriteLine(initialResponseHeader);

And we will see this to be printed:
"02e677a7799e484fb49447f8a600247d"
0.1.3.0;did-not-exist=true
As you can probably figure out, we have the ETag and the CacheCowHeader: first value is the version and did-not-exist means that item did not exist in the cache - which is understandable as this is the first call.

Now let's try this again:

var secondResponse = client.GetAsync("http://carmanager.azurewebsites.net/api/Car/5").Result;
var secondResponseHeader = secondResponse.Headers.Single(
      x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(secondResponseHeader);

And what will print is:
0.1.3.0;did-not-exist=false;cache-validation-applied=true;retrieved-from-cache=true
So in fact, it existed in the cache, retrieved from the cache and cache validation was applied. Cache validation is the process by which client makes conditional call to retrieve/update a resource only if the condition is met (see Background section in this post). For example, in GET calls it will send the ETag with a if-none-match header to retrieve

If you call a PUT on a resource that is cached, CacheCow.Client will use its ETag or Expires value to make a conditional PUT, unless you set UseConditionalPut property to false.

By-passing caching

There are some cases where you might not want the result be cached or retrieved from the cache regardless of the caching logic. All you have to do is to set the CacheControl header to no-cache or no-store:

var nocacheRequest = new HttpRequestMessage(HttpMethod.Get, 
  "http://carmanager.azurewebsites.net/api/Car/5");
nocacheRequest.Headers.CacheControl = new CacheControlHeaderValue()
 {
  NoCache = true
 };
var nocacheResponse = client.SendAsync(nocacheRequest).Result;
var nocacheResponseHeader = nocacheResponse.Headers.FirstOrDefault(
 x => x.Key == CacheCowHeader.Name);
Console.WriteLine(nocacheResponseHeader);

This will print an empty header since we have by passed the caching.

Last but not least

Thanks for trying out and using CacheCow. Please send me your feedbacks and bugs. Just ping me on twitter or use GitHub's issue tracker.