Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine.  If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine.  The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple.

Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data.  Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection. 

One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism.  In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library.  This should make it easier to investigate these topics in more detail.

Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple.  Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter.  Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel.  Once we know the minimum and maximum values, we most likely would have some simple code like the following:

for (int row=0; row < pixelData.GetUpperBound(0); ++row)
{
    for (int col=0; col < pixelData.GetUpperBound(1); ++col)
    {
        pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel);
    }
}

This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel.

As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition.  Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns.  We then perform one, independent operation on each element based on those loop positions.

This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change.  Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL:

Parallel.For(0, pixelData.GetUpperBound(0), row =>
{
    for (int col=0; col < pixelData.GetUpperBound(1); ++col)
    {
        pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel);
    }
});

Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine.  Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method.  In this case, our for loop iteration block becomes a delegate creating via a lambda expression.  This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime.

We could easily do this to our second for loop as well, but that may not be a good idea.  There is a balance to be struck when writing parallel code.  We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce.  In this case, we have an image of data – most likely hundreds of pixels in both dimensions.  By just parallelizing our first loop, each row of pixels can be run as a single task.  With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy.

If we parallelize both loops, we’re potentially creating millions of independent tasks.  This introduces extra overhead with no extra gain, and will actually reduce our overall performance.  This leads to my first guideline when writing parallel code:

Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy.

Also note that I parallelized the outer loop.  I could have just as easily partitioned the inner loop.  However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels).  My second guideline when writing parallel code reflects this:

Partition your problem in a way to place the most work possible into each task.

This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop.  If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced.

Parallel.For works great for situations where we know the number of elements we’re going to process in advance.  If we’re iterating through an IList<T> or an array, this is a typical approach.  However, there are other iteration statements common in C#.  In many situations, we’ll use foreach instead of a for loop.  This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance.

As an example, lets take the following situation.  Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change.  Normally, this might look something like:

foreach(var customer in customers)
{
    // Run some process that takes some time...
    DateTime lastContact = theStore.GetLastContact(customer);
    TimeSpan timeSinceContact = DateTime.Now - lastContact;

    // If it's been more than two weeks, send an email, and update...
    if (timeSinceContact.Days > 14)
    {
       theStore.EmailCustomer(customer); 
       customer.LastEmailContact = DateTime.Now;
    }
}

Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist.  If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach:

Parallel.ForEach(customers, customer =>
{
    // Run some process that takes some time...
    DateTime lastContact = theStore.GetLastContact(customer);
    TimeSpan timeSinceContact = DateTime.Now - lastContact;

    // If it's been more than two weeks, send an email, and update...
    if (timeSinceContact.Days > 14)
    {
       theStore.EmailCustomer(customer); 
       customer.LastEmailContact = DateTime.Now;
    }
});

Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression.  This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel.  The same guidelines apply with Parallel.ForEach as with Parallel.For.

The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library.  These, however, are very easy to implement using Parallel.ForEach and the yield keyword.

Most applications can benefit from implementing some form of Data Parallelism.  Iterating through collections and performing “work” is a very common pattern in nearly every application.  When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls.  Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

About Reed
Reed Copsey, Jr. - http://www.reedcopsey.com - http://twitter.com/ReedCopsey

Comments

9 Responses to “Parallelism in .NET – Part 2, Simple Imperative Data Parallelism”
  1. Chris says:

    You emphasized these two sentences:

    Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy.
    Partition your problem in a way to place the most work possible into each task.

    So if I am making a library for re-use, are their any design considerations I need to take into account before using parallelism? For example, if I am using my library to accomplish some tasks and I feel that doing these tasks in parallel would improve performance, will I experience a performance decrease if I use TPL because of my library?

    Thanks,
    Chris

    • Reed says:

      Chris,

      It’s definitely something you need to take into consideration in the design. There are really two options here – and the best depends a bit on how you’d want your library to be used.

      If the library provides a high level of abstraction, you might want to use the TPL within your library to allow the performance gains available to surface. However, if its a lower level API, it could easily be better to design your library to be thread safe, so the caller can parallelize as needed.

      -Reed

  2. Ashish Gupta says:

    Parallel.For/ForEach would block threads and probably not fit for server side applications (Windows services). Could you please provide examples of Data parallelism without blocking threads?

    • Reed says:

      Ashish –

      You can mix data parallelism with task parallelism to handle this – ie: spawn the Parallel.For loop within a Task. However, if you’re talking about WCF services, it’s often fine to block/use the server thread, as each request is spawned on a separate thread (if configured properly), and using that thread as part of your Parallel processing will keep the overhead to the minimum in that scenario.

Trackbacks

Check out what others are saying about this post...
  1. [...] This post was mentioned on Twitter by Reed Copsey, Jr., Caspar Kleijne. Caspar Kleijne said: #mustread Parallelism in .NET. Part 2, Simple Imperative Data Parallelism: http://bit.ly/7fr0r7 #csharp #dotnet /via @ReedCopsey [...]

  2. [...] simple data parallelism allows us to easily parallelize many of our iteration statements, there are cases that it does not [...]

  3. [...] the article on simple data parallelism, I described how to perform an operation on an entire collection of elements in parallel.  [...]

  4. [...] how this can be parallelized using the Task Parallel Library and imperative programming using imperative data parallelism via the Parallel class.  While this provides a huge step forward in terms of power and [...]

  5. [...] We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property.  For example, let’s revisit one of the simple data parallel examples from Part 2: [...]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!