Yahoo Canada Web Search

Search results

  1. Jan 25, 2018 · A ValueTask<T> -based async method is a bit faster than a Task<T> -based method if the method completes synchronously and a bit slower otherwise. A performance overhead of async methods that await non-completed task is way more substantial (~300 bytes per operation on x64 platform). And, as always, measure first.

    • Overview
    • Getting the Right Mental Model
    • Think Chunky, Not Chatty
    • Know When Not to Use Async
    • Care About Context
    • Lift Your Way out of Garbage Collection
    • Avoid Complexity
    • Asynchronicity and Performance

    October 2011

    Volume 26 Number 10

    Asynchronous Programming - Async Performance: Understanding the Costs of Async and Await

    By Stephen Toub | October 2011

    Asynchronous programming has long been the realm of only the most skilled and masochistic of developers—those with the time, inclination and mental capacity to reason about callback after callback of non-linear control flow. With the Microsoft .NET Framework 4.5, C# and Visual Basic deliver asynchronicity for the rest of us, such that mere mortals can write asynchronous methods almost as easily as writing synchronous methods. No more callbacks. No more explicit marshaling of code from one synchronization context to another. No more worrying about the flowing of results or exceptions. No more tricks that contort existing language features to ease async development. In short, no more hassle.

    Of course, while it’s now easy to get started writing asynchronous methods (see the articles by Eric Lippert and Mads Torgersen in this issue of MSDN Magazine), doing it really well still requires an understanding of what’s happening under the covers. Any time a language or framework raises the level of abstraction at which a developer can program, it invariably also encapsulates hidden performance costs. In many cases, such costs are negligible and can and should be ignored by the vast number of developers implementing the vast number of scenarios. However, it still behooves more advanced developers to really understand what costs exist so they can take any necessary steps to avoid those costs if they do eventually become visible. Such is the case with the asynchronous methods feature in C# and Visual Basic.

    For decades, developers have used high-level languages like C#, Visual Basic, F# and C++ to develop efficient applications. This experience has informed those developers about the relevant costs of various operations, and that knowledge has informed best development practices. For example, for most use cases, calling a synchronous method is relatively cheap, even more so when the compiler is able to inline the callee into the call site. Thus, developers learn to refactor code into small, maintainable methods, in general without needing to think about any negative ramifications from the increased method invocation count. These developers have a mental model for what it means to call a method.

    With the introduction of asynchronous methods, a new mental model is needed. While the C# and Visual Basic languages and compilers are able to provide the illusion of an asynchronous method being just like its synchronous counterpart, under the covers it’s no such thing. The compiler ends up generating a lot of code on behalf of the developer, code akin to the quantities of boilerplate code that developers implementing asynchronicity in days of yore would’ve had to have written and maintained by hand. Further still, the compiler-generated code calls into library code in the .NET Framework, again increasing the work done on behalf of the developer. To get the right mental model, and then to use that mental model to make appropriate development decisions, it’s important to understand what the compiler is generating on your behalf.

    When working with synchronous code, methods with empty bodies are practically free. This is not the case for asynchronous methods. Consider the following asynchronous method, which has a single statement in its body (and which due to lack of awaits will end up running synchronously):

    An intermediate language (IL) decompiler will reveal the true nature of this function once compiled, with output similar to what’s shown in Figure 1. What was a simple one-liner has been expanded into two methods, one of which exists on a helper state machine class. First, there’s a stub method that has the same basic signature as that written by the developer (the method is named the same, it has the same visibility, it accepts the same parameters and it retains its return type), but that stub doesn’t contain any of the code written by the developer. Rather, it contains setup boilerplate. The setup code initializes the state machine used to represent the asynchronous method and then kicks it off using a call to the secondary MoveNext method on the state machine. This state machine type holds state for the asynchronous method, allowing that state to be persisted across asynchronous await points, if necessary. It also contains the body of the method as written by the user, but contorted in a way that allows for results and exceptions to be lifted into the returned Task; for the current position in the method to be maintained so that execution may resume at that location after an await; and so on.

    Figure 1 Asynchronous Method Boilerplate

    When thinking through what asynchronous methods cost to invoke, keep this boilerplate in mind. The try/catch block in the MoveNext method will likely prevent it from getting inlined by the just-in-time (JIT) compiler, so at the very least we’ll now have the cost of a method invocation where in the synchronous case we likely would not (with such a small method body). We have multiple calls into Framework routines (like SetResult). And we have multiple writes to fields on the state machine type. Of course, we need to weigh all of this against the cost of the Console.WriteLine, which will likely dominate all of the other costs involved (it takes locks, it does I/O and so forth). Further, notice that there are optimizations the infrastructure does for you. For example, the state machine type is a struct. That struct will only be boxed to the heap if this method ever needs to suspend its execution because it’s awaiting an instance that’s not yet completed, and in this simple method, it never will complete. As such, the boilerplate of this asynchronous method won’t incur any allocations. The compiler and runtime work hard together to minimize the number of allocations involved in the infrastructure.

    The .NET Framework attempts to generate efficient asynchronous implementations for asynchronous methods, applying multiple optimizations. However, developers often have domain knowledge than can yield optimizations that would be risky and unwise for the compiler and runtime to apply automatically, given the generality they target. With this in mind, it can actually benefit a developer to avoid using async methods in a certain, small set of use cases, particularly for library methods that will be accessed in a more fine-grained manner. Typically, this is the case when it’s known that the method may actually be able to complete synchronously because the data it’s relying on is already available.

    When designing asynchronous methods, the Framework developers spent a lot of time optimizing away object allocations. This is because allocations represent one of the largest performance costs possible in the asynchronous method infrastructure. The act of allocating an object is typically quite cheap. Allocating objects is akin to filling your shopping cart with merchandise, in that it doesn’t cost you much effort to put items into your cart; it’s when you actually check out that you need to pull out your wallet and invest significant resources. While allocations are usually cheap, the resulting garbage collection can be a showstopper when it comes to the application’s performance. The act of garbage collection involves scanning through some portion of objects currently allocated and finding those that are no longer referenced. The more objects allocated, the longer it takes to perform this marking. Further, the larger the allocated objects and the more of them that are allocated, the more frequently garbage collection needs to occur. In this manner, then, allocations have a global effect on the system: the more garbage generated by asynchronous methods, the slower the overall program will run, even if micro benchmarks of the asynchronous methods themselves don’t reveal significant costs.

    For asynchronous methods that actually yield execution (due to awaiting an object that’s not yet completed), the asynchronous method infrastructure needs to allocate a Task object to return from the method, as that Task serves as a unique reference for this particular invocation. However, many asynchronous method invocations can complete without ever yielding. In such cases, the asynchronous method infrastructure may return a cached, already completed Task, one that it can use over and over to avoid allocating unnecessary Tasks. It’s only able to do this in limited circumstances, however, such as when the asynchronous method is a non-generic Task, a Task , or when it’s a Task where TResult is a reference type and the result of the asynchronous method is null. While this set may expand in the future, you can often do better if you have domain knowledge of the operation being implemented.

    Consider implementing a type like MemoryStream. MemoryStream derives from Stream, and thus can override Stream’s new .NET 4.5 ReadAsync, WriteAsync and FlushAsync methods to provide optimized implementations for the nature of MemoryStream. Because the operation of reading is simply going against an in-memory buffer and is therefore just a memory copy, better performance results if ReadAsync runs synchronously. Implementing this with an asynchronous method would look something like the following:

    Easy enough. And because Read is a synchronous call, and because there are no awaits in this method that will yield control, all invocations of ReadAsync will actually complete synchronously. Now, let’s consider a standard usage pattern of streams, such as a copy operation:

    Notice here that ReadAsync on the source stream for this particular series of calls is always invoked with the same count parameter (the buffer’s length), and thus it’s very likely that the return value (the number of bytes read) will also be repeating. Except in some rare circumstances, it’s very unlikely that the asynchronous method implementation of ReadAsync will be able to use a cached Task for its return value, but you can.

    There are many kinds of “context” in the .NET Framework: LogicalCallContext, SynchronizationContext, HostExecutionContext, SecurityContext, ExecutionContext and more (from the sheer number you might expect that the developers of the Framework are monetarily incentivized to introduce new contexts, but I assure you we’re not). Some of these contexts are very relevant to asynchronous methods, not only in functionality, but also in their impact on asynchronous method performance.

    SynchronizationContext SynchronizationContext plays a big role in asynchronous methods. A “synchronization context” is simply an abstraction over the ability to marshal delegate invocation in a manner specific to a given library or framework. For example, WPF provides a DispatcherSynchronizationContext to represent the UI thread for a Dispatcher: posting a delegate to this synchronization context causes that delegate to be queued for execution by the Dispatcher on its thread. ASP.NET provides an AspNetSynchronizationContext, which is used to ensure that asynchronous operations that occur as part of the processing of an ASP.NET request are executed serially and are associated with the right HttpContext state. And so on. All told, there are around 10 concrete implementations of SynchronizationContext within the .NET Framework, some public, some internal.

    When awaiting Tasks and other awaitable types provided by the .NET Framework, the “awaiters” for those types (like TaskAwaiter) capture the current SynchronizationContext at the time the await is issued. Upon completion of the awaitable, if there was a current SynchronizationContext that got captured, the continuation representing the remainder of the asynchronous method is posted to that SynchronizationContext. With that, developers writing an asynchronous method called from a UI thread don’t need to manually marshal invocations back to the UI thread in order to modify UI controls: such marshaling is handled automatically by the Framework infrastructure.

    Unfortunately, such marshaling also involves cost. For application developers using await to implement their control flow, this automatic marshaling is almost always the right solution. Libraries, however, are often a different story. Application developers typically need such marshaling because their code cares about the context under which it’s running, such as being able to access UI controls, or being able to access the HttpContext for the right ASP.NET request. Most libraries, however, do not suffer this constraint. As a result, this automatic marshaling is frequently an entirely unnecessary cost. Consider again the code shown earlier to copy data from one stream to another:

    If this copy operation is invoked from a UI thread, every awaited read and write operation will force the completion back to the UI thread. For a megabyte of source data and Streams that complete reads and writes asynchronously (which is most of them), that means upward of 500 hops from background threads to the UI thread. To address this, the Task and Task types provide a ConfigureAwait method. ConfigureAwait accepts a Boolean continueOnCapturedContext parameter that controls this marshaling behavior. If the default of true is used, the await will automatically complete back on the captured SynchronizationContext. If false is used, however, the SynchronizationContext will be ignored and the Framework will attempt to continue the execution wherever the previous asynchronous operation completed. Incorporating this into the stream-copying code results in the following more efficient version:

    For library developers, this performance impact alone is sufficient to warrant always using ConfigureAwait, unless it’s the rare circumstance where the library has domain knowledge of its environment and does need to execute the body of the method with access to the correct context.

    Asynchronous methods provide a nice illusion when it comes to local variables. In a synchronous method, local variables in C# and Visual Basic are stack-based, such that no heap allocations are necessary to store those locals. However, in asynchronous methods, the stack for the method goes away when the asynchronous method is suspending at an await point. For data to be available to the method after an await resumes, that data must be stored somewhere. Thus, the C# and Visual Basic compilers “lift” locals into a state machine struct, which is then boxed to the heap at the first await that suspends so that locals may survive across await points.

    Earlier in this article, I discussed how the cost and frequency of garbage collection is influenced by the number of objects allocated, while the frequency of garbage collection is also influenced by the size of objects allocated. The bigger the objects being allocated, the more often garbage collection will need to run. Thus, in an asynchronous method, the more locals that need to be lifted to the heap, the more often garbage collections will occur.

    As of the time of this writing, the C# and Visual Basic compilers sometimes lift more than is truly necessary. For example, consider the following code snippet:

    The dto variable isn’t read at all after the await point, and thus the value written to it before the await doesn’t need to survive across the await. However, the state machine type generated by the compiler to store locals still contains the dto reference, as shown in Figure 4.

    Figure 4 Local Lifting

    This slightly bloats the size of that heap object beyond what’s truly necessary. If you find that garbage collections are occurring more frequently than you expect, take a look at whether you really need all of the temporary variables you’ve coded into your asynchronous method. This example could be rewritten as follows to avoid the extra field on the state machine class:

    The C# and Visual Basic compilers are fairly impressive in terms of where you’re allowed to use awaits: almost anywhere. Await expressions may be used as part of larger expressions, allowing you to await Task instances in places you might have any other value-returning expression. For example, consider the following code, which returns the sum of three tasks’ results:

    The C# compiler allows you to use the expression “await b” as an argument to the Sum function. However, there are multiple awaits here whose results are passed as parameters to Sum, and due to order of evaluation rules and how async is implemented in the compiler, this particular example requires the compiler to “spill” the temporary results of the first two awaits. As you saw previously, locals are preserved across await points by having them lifted into fields on the state machine class. However, for cases like this one, where the values are on the CLR evaluation stack, those values aren’t lifted into the state machine but are instead spilled to a single temporary object and then referenced by the state machine. When you complete the await on the first task and go to await the second one, the compiler generates code that boxes the first result and stores the boxed object into a single <>t__stack field on the state machine. When you complete the await on the second task and go to await the third one, the compiler generates code that creates a Tuple from the first two values, storing that tuple into the same <>__stack field. This all means that, depending on how you write your code, you could end up with very different allocation patterns. Consider instead writing SumAsync as follows:

    With this change, the compiler will now emit three more fields onto the state machine class to store ra, rb and rc, and no spilling will occur. Thus, you have a trade-off: a larger state machine class with fewer allocations, or a smaller state machine class with more allocations. The total amount of memory allocated will be larger in the spilling case, as each object allocated has its own memory overhead, but in the end performance testing could reveal that’s still better. In general, as mentioned previously, you shouldn’t think through these kinds of micro-optimizations unless you find that the allocations are actually the cause of grief, but regardless, it’s helpful to know where these allocations are coming from.

    Of course, there’s arguably a much larger cost in the preceding examples that you should be aware of and proactively consider. The code isn’t able to invoke Sum until all three awaits have completed, and no work is done in between the awaits. Each of these awaits that yields requires a fair amount of work, so the fewer awaits you need to process, the better. It would behoove you, then, to combine all three of these awaits into just one by waiting on all of the tasks at once with Task.WhenAll:

    The Task.WhenAll method here returns a Task that won’t complete until all of the supplied tasks have completed, and it does so much more efficiently than just waiting on each individual task. It also gathers up the result from each task and stores it into an array. If you want to avoid that array, you can do that by forcing binding to the non-generic WhenAll method that works with Task instead of Task . For ultimate performance, you could also take a hybrid approach, where you first check to see if all of the tasks have completed successfully, and if they have, get their resultsindividually—but if they haven’t, then await a WhenAll of those that haven’t. That will avoid any allocations involved in the call to WhenAll when it’s unnecessary, such as allocating the params array to be passed into the method. And, as previously mentioned, we’d want this library function to also suppress context marshaling. Such a solution is shown in Figure 5.

    Figure 5 Applying Multiple Optimizations

    Asynchronous methods are a powerful productivity tool, enabling you to more easily write scalable and responsive libraries and applications. It’s important to keep in mind, though, that asynchronicity is not a performance optimization for an individual operation. Taking a synchronous operation and making it asynchronous will invariably degrade the performance of that one operation, as it still needs to accomplish everything that the synchronous operation did, but now with additional constraints and considerations. A reason you care about asynchronicity, then, is performance in the aggregate: how your overall system performs when you write everything asynchronously, such that you can overlap I/O and achieve better system utilization by consuming valuable resources only when they’re actually needed for execution. The asynchronous method implementation provided by the .NET Framework is well-optimized, and often ends up providing as good or better performance than well-written asynchronous implementations using existing patterns and volumes more code. Any time you’re planning to develop asynchronous code in the .NET Framework from now on, asynchronous methods should be your tool of choice. Still, it’s good for you as a developer to be aware of everything the Framework is doing on your behalf in these asynchronous methods, so you can ensure the end result is as good as it can possibly be.

    Stephen Toub is a principal architect on the Parallel Computing Platform team at Microsoft.

  2. Apr 18, 2016 · 10. In short and very general case - No, it usually will not. But it requires few words more, because "performance" can be understood in many ways. Async/await 'saves time' only when the 'job' is I/O-bound. Any application of it to jobs that are CPU-bound will introduce some performance hits.

  3. Jun 10, 2024 · Asynchronous processing allows tasks to be executed independently of the main program flow. This means that tasks can run concurrently, enabling the system to handle multiple operations simultaneously. Unlike synchronous processing, where tasks are completed one after another, asynchronous processing helps in reducing idle time and improving ...

  4. Sep 22, 2024 · Asynchronous programming and caching can be combined to further improve performance. For example, you can asynchronously fetch data from the cache and, if the cache misses, asynchronously retrieve the data from the original source and store it in the cache. public async Task<string> GetDataAsync() { string cachedData = await _cache ...

  5. Mar 16, 2023 · There are two await s in the async method: one for a Task<int> returned by ReadAsync, and one for a Task returned by WriteAsync. Task.GetAwaiter () returns a TaskAwaiter, and Task<TResult>.GetAwaiter () returns a TaskAwaiter<TResult>, both of which are distinct struct types.

  6. People also ask

  7. Jun 14, 2024 · There are four main ways on returning a value from a Task asynchronously: Task<T> and await: Most straightfoward. ContinueWith: Use it to chain execution with another Task. WhenAll/WhenAny: Use it ...

  1. People also search for