Thursday, April 26, 2012

The Database As Queue Anti-Pattern

“When all you have is a hammer, every problem looks like a nail.”

When all you know is SQL Server, it’s tempting to try and solve every problem with a relational database, but often it’s not the best tool for the job. Today I want to introduce you to an excellent example of this, the ‘Database As Queue’ anti-pattern. It’s a common specimen that you’ll find in the wild in many enterprise development shops.

It works like this: Usually there’s a piece of data that needs to be processed through some workflow, often by a number of different applications or processes.

Database_as_queue

There’s usually a table with some data and a status column. Each process is designed to process data at a particular status. Process 1 in this example inserts records with the status ‘New’, Process 2 picks up ‘New’ records and updates the status to ‘P2 Complete’, Process 3 picks up ‘P2 Complete’ records… you get the picture.

When I say ‘picks up’, what I mean is that each process sits there polling the database at some interval running a select statement something like this:

SELECT * FROM MyWorkflow WHERE Status = 'New'

Why is it an anti-pattern?

Firstly, there’s the polling. Either you have a short interval and hammer your database into submission, or you have a long interval and even the fastest route through your system is the sum of all the (long) intervals.

Secondly there’s the fact that SQL Server is very efficient at insertions, or updates, or queries, but rarely all three on the same table. To make the polling queries perform, you need to put an index on the status, but indexes make for slow inserts. Locks can become a major problem.

Thirdly there’s the issue of how you clear records once the workflow is complete. If you don’t clear the table down at intervals, it will keep growing until it starts to hit the performances of your queries and eventually become a serious operations problem. The problem is, deletes are inefficient on SQL Server.

Fourthly, sharing a database between applications (or services) is a bad thing. It’s just too tempting to put amorphous shared state in there and before you know it you’ll have a hugely coupled monster.

So what should I do?

Simple, use the right tool for the job: this scenario is crying out for a messaging system. It solves all the problems described above; no more polling, efficient message delivery, no need to clear completed messages from queues, and no shared state.

Tuesday, April 24, 2012

A Useful Linq Extension Method: Intersperse

Have you ever had a case where you want to insert a constant item in-between each element of a list of items? For example, if I’ve got a list of numbers:

1,2,3,4,5

And I want to insert 0 between each one to get:

1,0,2,0,3,0,4,0,5

Meet my friend Intersperse (stolen from Haskell, if you hadn’t guessed):

var items = new[] {1, 2, 3, 4, 5}.Intersperse(0).ToArray();

foreach (var item in items)
{
Console.Write(item);
}

This code outputs:

102030405

And here is the Intersperse extension method:

public static IEnumerable<T> Intersperse<T>(this IEnumerable<T> items, T separator)
{
var first = true;
foreach (var item in items)
{
if (first) first = false;
else
{
yield return separator;
}
yield return item;
}
}

Some interesting uses. System.String is, of course, an array of char, so you can do this:

var doubleSpaced = new string("hello world".Intersperse(' ').ToArray());
Console.WriteLine(doubleSpaced);
// outputs 'h e l l o w o r l d'

Or how about this perennial problem, creating a comma separated list:

var strings = new[] {"one", "two", "three"};
var commaSeprated = new string(strings.Intersperse(",").Concat().ToArray());
Console.WriteLine(commaSeprated);
// outputs one,two,three