Back to: LINQ Tutorial For Beginners and Professionals
Deep Dive into PLINQ Operations
In this article, I will discuss Deep Dive into PLINQ Operations with Examples. Please read our previous article discussing Getting Started with PLINQ. At the end of this article, you will understand the following pointers:
- Deep Dive into PLINQ Operations
- How Does PLINQ Work?
- Ordered vs. Unordered PLINQ Queries
- Example Showing the Performance Implications between Ordered vs. Unordered PLINQ Queries
- Thread Safety in PLINQ Queries
- Aggregation Operations in PLINQ Queries
Deep Dive into PLINQ Operations
Parallel LINQ (PLINQ) is an extension of LINQ (Language Integrated Query) that enables parallel processing of queries to improve performance on large datasets or complex operations that can be parallelized. PLINQ can significantly speed up CPU-bound operations by efficiently utilizing multiple processors or cores on a system. However, it’s important to use PLINQ with proper performance testing, as not all operations will benefit from parallelization, and, in some cases, it might even lead to worse performance than sequential LINQ due to overheads associated with thread management and synchronization.
Key Features of PLINQ
- Automatic Parallelization: PLINQ queries are automatically parallelized by dividing the source data into partitions and processing them concurrently on multiple threads.
- Sequential Fallback: You can specify that PLINQ should revert to sequential processing with the AsSequential method if that’s deemed more efficient.
- Preservation of Order: By default, PLINQ does not preserve the order of the source sequence. However, you can enforce order preservation using the AsOrdered method.
- Aggregation Operations: PLINQ supports various aggregation operations (Sum, Count, Average, etc.) that can be parallelized for performance gains.
- Cancellation Support: PLINQ queries support cancellation through the CancellationToken mechanism.
How Does PLINQ Work?
PLINQ operates by breaking down the source collection into multiple partitions, which are then processed in parallel across different threads. The PLINQ engine determines the optimal number of partitions and threads based on the workload and system capabilities. After processing, the results are combined and returned to the caller.
Ordered vs. Unordered PLINQ Queries
In .NET, PLINQ (Parallel Language Integrated Query) is a parallel version of LINQ (Language Integrated Query) that can significantly speed up the processing of large data collections by utilizing multiple processors or cores on a computer. However, the behavior of ordered vs. unordered PLINQ queries can impact the performance and the results of these queries, making it important to understand the differences.
Unordered PLINQ Queries
Unordered PLINQ queries do not preserve the order of the source sequence. This is the default behavior when you parallelize your LINQ queries using PLINQ. These queries can often run more efficiently than ordered queries because they have fewer constraints to satisfy. The runtime can more freely distribute work across multiple threads without worrying about maintaining the original order of elements. This is useful when the order of results is not important for the subsequent operation or result interpretation.
Syntax of an unordered PLINQ query:
var unorderedResults = myCollection.AsParallel() .Where(x => SomePredicate(x)) .Select(x => SomeTransformation(x));
Ordered PLINQ Queries
Ordered PLINQ queries attempt to preserve the order of the source sequence. This can be important when the sequence order impacts the result. For example, maintaining order is crucial if you are processing data that is time-series in nature or when the order affects how the data is interpreted or displayed. However, preserving order can introduce overhead because the system must take additional steps to ensure that the output is produced correctly, potentially reducing the benefits of parallelism.
You can make a PLINQ query ordered by calling the AsOrdered method:
var orderedResults = myCollection.AsParallel().AsOrdered() .Where(x => SomePredicate(x)) .Select(x => SomeTransformation(x));
Example Showing the Performance Implications between Ordered vs. Unordered PLINQ Queries
Let’s look at an example to demonstrate the performance implications of using ordered vs. unordered PLINQ queries. In this example, we’ll perform a simple operation on a large collection of integers, filtering and then transforming the numbers. We’ll compare the execution times of ordered and unordered queries to illustrate the performance differences.
Please note that the actual performance difference will depend on several factors, including the nature of the operations, the size of the data, and the hardware (such as the number of processors) on which the code is running.
using System; using System.Diagnostics; using System.Linq; namespace PLINQDemo { class Program { static void Main(string[] args) { // Create a large array of integers int[] numbers = Enumerable.Range(1, 10000000).ToArray(); // Measure unordered PLINQ query performance MeasurePLINQPerformance(numbers, ordered: false); // Measure ordered PLINQ query performance MeasurePLINQPerformance(numbers, ordered: true); Console.ReadKey(); } //This function will execute both unordered and ordered PLINQ queries //and measure their execution time: public static void MeasurePLINQPerformance(int[] numbers, bool ordered) { Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); if (ordered) { var result = numbers.AsParallel().AsOrdered() .Where(n => n % 2 == 0) .Select(n => n * 2).ToArray(); } else { var result = numbers.AsParallel() .Where(n => n % 2 == 0) .Select(n => n * 2).ToArray(); } stopwatch.Stop(); Console.WriteLine($"{(ordered ? "Ordered" : "Unordered")} PLINQ query completed in {stopwatch.ElapsedMilliseconds} ms."); } } }
This example first generates a large array of integers using Enumerable.Range. It then defines a method MeasurePLINQPerformance that takes this array and a boolean indicating whether to use an ordered or unordered PLINQ query. The method filters even numbers (n % 2 == 0) and then doubles them (n * 2), using either AsParallel().AsOrdered() is for ordered processing, and AsParallel() is for unordered processing.
When you run this program, it will execute the ordered and unordered queries on the array and print out the time taken for each. Generally, you’ll observe that the unordered query completes faster than the ordered query because it has fewer constraints to satisfy (i.e., it doesn’t need to preserve the order of elements), allowing for more efficient parallel execution, as shown in the image below.
Considerations
- Performance vs. Order: Choose between unordered and ordered queries based on whether the order of elements is significant for your application and whether the performance trade-off is worth it.
- Mixing Ordered and Unordered: You can mix ordered and unordered operations in the same query to optimize performance while still preserving order when necessary. For example, use unordered operations for the initial filtering and then apply ordering for the final transformations and results.
- ForAll for Unordered Processing: When you do not need to preserve the source order, consider using the ForAll method for terminal operations. ForAll does not preserve ordering but can be more efficient for certain operations.
- Behavioral Differences: The behavior of some operations can differ significantly between ordered and unordered queries, especially for operations that inherently depend on the sequence order, such as Take or Skip.
Choosing between ordered and unordered PLINQ queries requires balancing the need to preserve the sequence order against the potential performance gains from parallel execution. Always consider testing both approaches to see which offers the best performance while meeting your application’s requirements.
Ensuring Thread Safety in PLINQ Queries
Ensuring thread safety in PLINQ, or Parallel LINQ, is crucial for writing efficient, error-free parallelized code in .NET applications. PLINQ enables you to execute LINQ queries in parallel, potentially improving performance on multi-core or multi-processor systems. However, parallel execution can introduce thread safety issues if not handled properly. Here are several strategies to ensure thread safety when using PLINQ:
- Use Immutable Data Structures: Immutable data structures cannot be modified once they’re created. This makes them naturally thread-safe, as concurrent operations on these data structures don’t result in race conditions.
- Employ Thread-Safe Collections: .NET provides several thread-safe collections in the System.Collections.Concurrent namespaces, such as ConcurrentBag<T>, ConcurrentQueue<T>, and ConcurrentDictionary<TKey, TValue>. These collections are designed to handle concurrent access and modifications efficiently.
- Aggregate Results Properly: PLINQ offers aggregation operations (e.g., Aggregate, Sum, Max) that are designed to be thread-safe. When aggregating results, ensure you use these operations correctly to avoid race conditions. For custom aggregations, use the Aggregate method with proper seed, func, and resultSelector parameters to ensure thread safety.
- Limit Shared State: Avoid sharing state across threads as much as possible. If the shared state is necessary, ensure access to the shared state is synchronized using locking mechanisms (e.g., lock statement in C#) or by using concurrent collections.
- Use AsParallel Wisely: Not all LINQ queries benefit from parallelization. Use the AsParallel method when required. Overuse or improper use can lead to performance degradation and increased complexity. Test your application to determine if parallelization is beneficial.
- Handle Exceptions Properly: PLINQ can throw an AggregateException if multiple threads encounter exceptions. Handle exceptions using try-catch blocks around PLINQ queries and inspect the InnerExceptions property of AggregateException to address individual exceptions.
- Specify Merge Options: PLINQ allows you to specify merge options via the WithMergeOptions method. Merge options determine how PLINQ combines the results from different partitions. Choosing the appropriate merge option can help optimize performance and ensure thread safety.
- Control Degree of Parallelism: Use the WithDegreeOfParallelism method to limit the number of concurrent operations. This can be helpful in reducing resource contention and ensuring that your application doesn’t overwhelm the system with too many concurrent threads.
Let’s create a simple .NET Console Application that demonstrates how to ensure thread safety when using PLINQ. In this example, we’ll perform a parallel query on a collection of numbers, showcasing how to use thread-safe operations and handle exceptions gracefully.
Our goal is to:
- Find the squares of all even numbers in a large list.
- Use a thread-safe collection to store the results.
- Handle any potential exceptions that may occur during the parallel query.
Open the Program.cs file and replace its content with the following code:
using System; using System.Collections.Concurrent; using System.Linq; namespace PLINQDemo { class Program { static void Main(string[] args) { // Example list of numbers var numbers = Enumerable.Range(1, 10000); // Thread-safe collection to store results var safeCollection = new ConcurrentBag<int>(); try { // Parallel query to find squares of all even numbers numbers.AsParallel().Where(n => n % 2 == 0).ForAll(n => { var square = n * n; safeCollection.Add(square); // Thread-safe operation }); Console.WriteLine("Squares of even numbers have been calculated and stored successfully."); } catch (AggregateException ae) { // Handling exceptions gracefully Console.WriteLine("One or more exceptions occurred:"); foreach (var ex in ae.InnerExceptions) { Console.WriteLine($" {ex.Message}"); } } // Optional: Displaying a few results Console.WriteLine("Displaying first 10 results:"); safeCollection.Take(10).ToList().ForEach(s => Console.WriteLine(s)); Console.ReadKey(); } } }
Explanation:
- Thread Safety: It uses a ConcurrentBag<int> to store the squares of even numbers. ConcurrentBag is a thread-safe collection suitable for scenarios where multiple threads are adding items concurrently.
- Parallel Query: The AsParallel() method processes a sequence in parallel. The Where clause filters even numbers and ForAll is used for the parallel execution block.
- Exception Handling: The parallel query is wrapped in a try-catch block to handle AggregateException, which can contain one or more exceptions thrown during the parallel execution.
Now, run the above application code, and you should get the following output:
Aggregation Operations in PLINQ Queries
PLINQ supports most of the standard query operators provided by LINQ and adds some additional operations for parallel execution. When working with PLINQ, understanding how to use aggregation operations effectively is crucial for processing collections of data in parallel to produce a single, aggregated result.
Key Aggregation Operations in PLINQ
Aggregation operations in PLINQ are similar to those in LINQ but are designed to work efficiently in a parallel environment. Here are some of the key aggregation operations:
- Aggregate(): Applies an accumulator function over a sequence. This function is useful for producing a single cumulative value from a sequence of values by applying a function you define. The parallel version tries to partition the source elements and then merge the partitions at the end.
- Sum(), Average(), Min(), Max(): These standard aggregate functions compute the sum, average, minimum, and maximum of a numeric sequence, respectively. PLINQ executes these operations in parallel, dividing the work across multiple threads to speed up the computation.
- Count() and LongCount(): Counts the number of elements in a sequence. Like other aggregation operations, PLINQ performs this operation in parallel, potentially counting different segments of the sequence concurrently.
- First(), FirstOrDefault(), Last(), LastOrDefault(): These operations return elements from the sequence based on their position. While not aggregations in the traditional sense, they are often used in conjunction with aggregation operations. Their parallel execution depends on the operation’s nature and data partitioning.
Now, replace the contents of Program.cs with the following code. This code demonstrates using various PLINQ aggregation operations on an array of integers.
using System; using System.Linq; namespace PLINQDemo { class Program { static void Main(string[] args) { // Example array int[] numbers = Enumerable.Range(1, 100).ToArray(); // Demonstrating Aggregate() int sumAggregate = numbers.AsParallel().Aggregate( 0, (subtotal, n) => subtotal + n, (total1, total2) => total1 + total2, total => total); Console.WriteLine($"Aggregate Sum: {sumAggregate}"); // Demonstrating Sum() int sum = numbers.AsParallel().Sum(); Console.WriteLine($"Sum: {sum}"); // Demonstrating Average() double average = numbers.AsParallel().Average(); Console.WriteLine($"Average: {average}"); // Demonstrating Min() int min = numbers.AsParallel().Min(); Console.WriteLine($"Min: {min}"); // Demonstrating Max() int max = numbers.AsParallel().Max(); Console.WriteLine($"Max: {max}"); // Demonstrating Count() int count = numbers.AsParallel().Count(); Console.WriteLine($"Count: {count}"); // Demonstrating LongCount() long longCount = numbers.AsParallel().LongCount(); Console.WriteLine($"LongCount: {longCount}"); // Demonstrating First() int first = numbers.AsParallel().First(); Console.WriteLine($"First: {first}"); // Demonstrating FirstOrDefault() int firstOrDefault = numbers.AsParallel().FirstOrDefault(); Console.WriteLine($"FirstOrDefault: {firstOrDefault}"); // Demonstrating Last() int last = numbers.AsParallel().Last(); Console.WriteLine($"Last: {last}"); // Demonstrating LastOrDefault() int lastOrDefault = numbers.AsParallel().LastOrDefault(); Console.WriteLine($"LastOrDefault: {lastOrDefault}"); Console.ReadKey(); } } }
Explanation:
- Aggregate(): Demonstrates how to sum up numbers using a more manual approach, providing a clear example of how custom aggregate functions can be implemented.
- Sum(), Average(), Min(), Max(): These methods compute the sum, average, minimum, and maximum of the numbers array in parallel.
- Count() and LongCount(): Count the number of elements in the array, demonstrating both integer and long integer counts.
- First(), FirstOrDefault(), Last(), LastOrDefault(): Showcase retrieving the first and last elements of the array, with and without default values for empty collections.
Now, run the above application code, and you should get the following output:
Considerations for Using Aggregation Operations in PLINQ
- Order Preservation: PLINQ attempts to preserve the order of the source sequence when it matters to the operation. However, for operations like Min(), Max(), Count(), and Sum(), the order does not affect the outcome, allowing PLINQ to optimize performance.
- Associativity and Commutativity: For the Aggregate() operation, the function you provide should be associative and commutative because PLINQ might apply the function to elements in parallel and any order.
- Parallelism Control: You can control the degree of parallelism using the WithDegreeOfParallelism() method. This is useful if you want to limit how many processor cores are used for the query.
- AsParallel() and AsSequential(): Use AsParallel() to convert a LINQ query to PLINQ for parallel execution. If you must ensure that part of your query executes sequentially, use AsSequential().
In the next article, I will discuss Partitioning Strategies for Data Parallelism in PLINQ with Examples. In this article, I explain Deep Dive into PLINQ Operations with Examples. I hope you enjoy this Deep Dive into PLINQ Operations with Examples article.