Parallelism in .NET – Part 6, Declarative Data Parallelism

When working with a problem that can be decomposed by data, we have a collection, and some operation being performed upon the collection.  I’ve demonstrated how this can be parallelized using the Task Parallel Library and imperative programming using imperative data parallelism via the Parallel class.  While this provides a huge step forward in terms of power and capabilities, in many cases, special care must still be given for relative common scenarios.

C# 3.0 and Visual Basic 9.0 introduced a new, declarative programming model to .NET via the LINQ Project.  When working with collections, we can now write software that describes what we want to occur without having to explicitly state how the program should accomplish the task.  By taking advantage of LINQ, many operations become much shorter, more elegant, and easier to understand and maintain.  Version 4.0 of the .NET framework extends this concept into the parallel computation space by introducing Parallel LINQ.

Before we delve into PLINQ, let’s begin with a short discussion of LINQ.  LINQ, the extensions to the .NET Framework which implement language integrated query, set, and transform operations, is implemented in many flavors.  For our purposes, we are interested in LINQ to Objects.  When dealing with parallelizing a routine, we typically are dealing with in-memory data storage.  More data-access oriented LINQ variants, such as LINQ to SQL and LINQ to Entities in the Entity Framework fall outside of our concern, since the parallelism there is the concern of the data base engine processing the query itself.

LINQ (LINQ to Objects in particular) works by implementing a series of extension methods, most of which work on IEnumerable<T>.  The language enhancements use these extension methods to create a very concise, readable alternative to using traditional foreach statement.  For example, let’s revisit our minimum aggregation routine we wrote in Part 4:

double min = double.MaxValue;
foreach(var item in collection)
    double value = item.PerformComputation();
    min = System.Math.Min(min, value);

Here, we’re doing a very simple computation, but writing this in an imperative style.  This can be loosely translated to English as:

Create a very large number, and save it in min 
Loop through each item in the collection. 
For every item: 
    Perform some computation, and save the result 
    If the computation is less than min, set min to the computation 

Although this is fairly easy to follow, it’s quite a few lines of code, and it requires us to read through the code, step by step, line by line, in order to understand the intention of the developer.

We can rework this same statement, using LINQ:

double min = collection.Min(item => item.PerformComputation());

Here, we’re after the same information.  However, this is written using a declarative programming style.  When we see this code, we’d naturally translate this to English as:

Save the Min value of collection, determined via calling item.PerformComputation() 

That’s it – instead of multiple logical steps, we have one single, declarative request.  This makes the developer’s intentions very clear, and very easy to follow.  The system is free to implement this using whatever method required.

Parallel LINQ (PLINQ) extends LINQ to Objects to support parallel operations.  This is a perfect fit in many cases when you have a problem that can be decomposed by data.  To show this, let’s again refer to our minimum aggregation routine from Part 4, but this time, let’s review our final, parallelized version:

// Safe, and fast!
double min = double.MaxValue;
// Make a "lock" object
object syncObject = new object();
    // First, we provide a local state initialization delegate.
    () => double.MaxValue,
    // Next, we supply the body, which takes the original item, loop state,
    // and local state, and returns a new local state
    (item, loopState, localState) =>
        double value = item.PerformComputation();
        return System.Math.Min(localState, value);
    // Finally, we provide an Action<TLocal>, to "merge" results together
    localState =>
        // This requires locking, but it's only once per used thread
            min = System.Math.Min(min, localState);

Here, we’re doing the same computation as above, but fully parallelized.  Describing this in English becomes quite a feat:

Create a very large number, and save it in min
Create a temporary object we can use for locking
Call Parallel.ForEach, specifying three delegates
    For the first delegate:
        Initialize a local variable to hold the local state to a very large number
    For the second delegate:
        For each item in the collection, perform some computation, save the result
        If the result is less than our local state, save the result in local state
    For the final delegate:
        Take a lock on our temporary object to protect our min variable
        Save the min of our min and local state variables

Although this solves our problem, and does it in a very efficient way, we’ve created a set of code that is quite a bit more difficult to understand and maintain.

PLINQ provides us with a very nice alternative.  In order to use PLINQ, we need to learn one new extension method that works on IEnumerable<T> – ParallelEnumerable.AsParallel().

That’s all we need to learn in order to use PLINQ: one single method.  We can write our minimum aggregation in PLINQ very simply:

double min = collection.AsParallel().Min(item => item.PerformComputation());

By simply adding “.AsParallel()” to our LINQ to Objects query, we converted this to using PLINQ and running this computation in parallel! 

This can be loosely translated into English easily, as well:

Process the collection in parallel 
Get the Minimum value, determined by calling PerformComputation on each item 

Here, our intention is very clear and easy to understand.  We just want to perform the same operation we did in serial, but run it “as parallel”.  PLINQ completely extends LINQ to Objects: the entire functionality of LINQ to Objects is available.  By simply adding a call to AsParallel(), we can specify that a collection should be processed in parallel.  This is simple, safe, and incredibly useful.

About Reed
Reed Copsey, Jr. - -


5 Responses to “Parallelism in .NET – Part 6, Declarative Data Parallelism”
  1. Leo says:

    Thanks for posting the series. It’s really well written and easy to follow. Very useful reading indeed. I just wanted to make one suggestion. Could you consider adding a link at the end of each part that points to the next part? This would allow people to read through the series continuously without going back to the index page after each part. In some parts you do have trackbacks that point to other parts but it’s not as convenient as a simple Next Part/Previous Part link at the end of the article.
    Thanks again for posting this,

    • Reed says:


      I will most likely do this at some point. I often go back and update the posts to link through for the entire series once I’ve finished it – however, this series, being as long as it is, has taken me a while.

      I do have a single index page, however, which should make a it a bit easier to navigate.

      Thanks for the suggestion!



Check out what others are saying about this post...
  1. […] This post was mentioned on Twitter by Reed Copsey, Jr., Caspar Kleijne. Caspar Kleijne said: #mustreed Parallelism in .NET. Published Part 6, Declarative Data Parallelism #csharp #dotnet /via @ReedCopsey […]

  2. […] my previous post on Declarative Data Parallelism, I mentioned that PLINQ extends LINQ to Objects to support parallel operations.  Although […]

  3. […] Let’s revisit our declarative data parallelism sample from Part 6: […]

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!