Replaced async model with true async #337

henkish · 2013-09-02T15:02:06Z

Rewrote DoAsyncRequest to use real async web methods instead of starting
new tasks and allocating thread pool resources for waits and callbacks.

Removed the MaximumAsyncConnections Semaphore because connection limits are already enforced in the System.Net.ServicePointManager...

Rewrote DoAsyncRequest to use real async web methods instead of starting new tasks and allocating thread pool resources for waits and callbacks.

Mpdreamz · 2013-09-02T21:06:20Z

I love pull requests like this!

May I ask what prompted the rewrite? I have not run this implementation yet and gathered numbers but at first glance its the same threadpool usage wise.

I use Task.Factory.FromAsync to wrap over the BeginXXX/EndXXX pattern which is async and should use IO completion ports the same as manually calling BeginXXX and EndXXX so besides the minimal overhead of creating the task it should be just as async / threadpool friendly.

The original implementation iterates over the yielded tasks in a seperate thread (LongRunning hint) thats not part of the treadpool, which is the reason for the semaphore. (I also do not like System.Net.ServicePointManager.DefaultConnectionLimit because this rule applies to the AppDomain globally and you might have more moving parts that require different numbers).

The RegisterForSingleObject on the threadpool is sadly the only way to implement timeouts in a controllable way, I've had many cases where the timeout property on requests itself are not adhered which your new routine does not have an alternative implementation for.

henkish · 2013-09-02T22:08:33Z

On line 230 you allocate a worker thread to run the async code for you using Task.Factory.StartNew();
The worker thread will be blocked until the async code is completed on line 235.

henkish · 2013-09-02T22:24:55Z

And besides creating a whole bunch of threads, I think it seems like be a bad idea to create them outside the thread pool. LongRunning on threads that only live for the length of a web request sounds wrong to me.

Mpdreamz · 2013-09-03T00:06:51Z

Explictly starting threads outside the threadpool is generally a good idea in something like asp.net where you are competing with the server for the 200 max threadpool threads that are actually serving the requests. Threadpool thread starvation is a real issue there.

Task.Factory.StartNew() is non blocking:

http://localghost.io/articles/oss-development,-a-continuous-lesson-in-humility-2013-03-07/

The LongRunning hint prevents it from being inlined on the calling thread.

Before arguing the pros/cons too much I better pull the bits in and let the numbers speak for themselves, I'll pull this in one way or another. If the numbers are around the same throughput wise i'll pull this in as 'NoTasksHttpConnection' implentation of IConnection.

Again thanks for taking the time @henkish

henkish · 2013-09-03T07:53:18Z

Why start threads at all? The whole point of async is to free resources.
LongRunning only hints that the thread should be created outside the thread pool, the thread is still created and running in the process. http://stackoverflow.com/questions/7915947/task-longrunning-side-effects

I like your lib and use it in a lot of projects, but you need to understand that the current async implementation is dangerous and needs to be fixed. I don't care if you use my code or fix your current code, as long as it gets fixed!

The code below emulates 1000 concurrent requests, each taking 1 second.
Please look at the thread and memory usage for both tests runs.

class Program
{
    static void Main( string[] args )
    {
        Console.WriteLine("Wrapped in LongRunning task:");
        RunTest( () => Task.Factory.StartNew( () => Task.Delay( 1000 ).Wait(), TaskCreationOptions.LongRunning ) );

        Console.WriteLine( "Not wrapped:" );
        RunTest( () => Task.Delay( 1000 ) );
    }

    private static void RunTest( Func<Task> createTask )
    {
        var tasks = Enumerable.Range( 0, 1000 ).Select( i => createTask() ).ToArray();

        var process = Process.GetCurrentProcess();
        Console.WriteLine( "\tThreads: " + process.Threads.Count );
        Console.WriteLine( "\tVirtual memory: " + process.VirtualMemorySize64 );

        Task.WhenAll( tasks ).Wait();
    }
}

Mpdreamz · 2013-09-03T11:42:36Z

Hey @henkish I'm well aware of the fact that threads are not cheap. Dangerous is a bit of an overreach here since we do have a semaphore in place preventing us from creating a 1000 threads at once.

I'm not trying to argue which version is better though i'm trying to shed some light on why things are the way they are.

The two main reasons for actually running in threads is concurrency throttling not maximizing, i.e single connection taking to long at elasticsearch itself because we are slamming it too hard (yeah i know bigger cluster). Which I've seen cause hard to debug connection issues locally.

I'm trying to argue (mostly with myself :)) if that fear was warranted enough to keep it in or not.

I get similar thread statistics using the concurrency visualizer (shift+alt+f5 on the protocolloadtest) to your aysnc routine if i simply remove the wrapping in a LongRunning task:

https://github.com/Mpdreamz/NEST/blob/978737797bcca0567f85e2d107ca1a89bbb65747/src/Nest/Domain/Connection/Connection.cs#L230

One other thing is that async web requests don't listen to Timeout

http://stackoverflow.com/questions/7368737/task-fromasync-timeout

So the ThreadPool.RegisterWaitForSingleObject really have to be in there.

What I'm thinking now is to set the default MaximumAsyncConnections() to 0 in which case it will not wrap in a LongRunningTask nor use the semaphore.

Love your thoughts on this!

Mpdreamz · 2013-09-09T19:57:16Z

This is now merged into master, ty for alerting me to this @henkish.

The default Connection no longer wraps the async stuff in a thread needlessly if no MaximumAsyncConnections() is set explicitly, the default for the latter is also set to 0 instead of 25.

I created a separate IConnection for your async implementation called NoTasksHttpConnection which from my tests performs 1-2% better then the default Connection implementation.

henkish · 2013-09-15T21:45:04Z

Thanks!

But I disagree with the naming of the class. Yes, it does not use tasks, that's true. But the main difference/advantage is that the NoTasksHttpConnection is not using any unnecessary resources (threads).

If the codebase was .NET 4.5 i would have used Tasks and the await, but when that's not an option I went for the classic Begin/End pattern to maintain readability.

The default Connection implementation is still creating threads using TaskCreationOptions.LongRunning.

You wrote: "Explictly starting threads outside the threadpool is generally a good idea in something like asp.net where you are competing with the server for the 200 max threadpool threads that are actually serving the requests. Threadpool thread starvation is a real issue there."

I don't agree with you, if you use async the correct way there is no need to allocate threads, not inside nor outside the threadpool (while waiting on resources). There will be no threadpool starvation, or starvation of any other threads for that matter.

Whether or not the Timeout property works in HttpWebRequest while performing async calls I don't know. But maybe you could try the System.Net.Http.HttpClient class instead? It's async all the way and the API for doing HTTP calls is in many ways improved from HttpWebRequest. From what I can see there are Timeout settings in the API.

Regarding the test results, maybe you should consider monitoring other resources while running the performance comparison tests, how does the thread and memory usage stand in comparison for the process?

Please consider once more the importance of removing the TaskFactory.StartNew method from the Connection async implementation. Ask other developers of their opinion!!

Generally I think it's a bad idea of maintaining two similar implementations of the some code, and if the thread issue is solved then I will gladly use the default implementation again.

Cheers,
Henrik

Mpdreamz · 2013-09-15T22:22:00Z

Yeah I realized after the last release that the StartNew is still not
necessary even for throttling since we have a semaphore in place before
actually kicking off the async routine .. DOH!.

I realize there's a difference between threads and socket IO completion
ports and the Task.FromAsync should be functionally equivalent too manually
calling Begin_/Start_ thats what I had memory/cpu profiled and found to be
equivalent as expected :)

Even so, the slight overhead of yielding io tasks and iterating over them
instead of the 'oldschool' callback based Begin_/End_ pattern still means
the latter is ever so slightly (but consistently) faster.

Thats why for now I included both, the hipster dev in me loves the task
yielding approach (readability) but in the end numbers matter and I need
some more of those before I can cut one implementation loose.

So to recap if you dont set MaximumAsyncConnections in the current version
you are not using threads but IO completion ports all the way. In the next
version it will not start new threads ever irregardless of the connection
implementation.

HttpClient is awesome but .net 4.5/mono 3.0 only and the
System.Net.HttpClient nuget packages available are not very mono/pcl
friendly or atleast that used to be so. I believe that situation has
now improved with the Microsoft.Bcl.async package though.

Thanks for getting back to me Henrik, appreciated.

On Sunday, September 15, 2013, Henrik Jönsson wrote:

Thanks!

But I disagree with the naming of the class. Yes, it does not use tasks,
that's true. But the main difference/advantage is that the
NoTasksHttpConnection is not using any unnecessary resources (threads).

If the codebase was .NET 4.5 i would have used Tasks and the await, but
when that's not an option I went for the classic Begin/End pattern to
maintain readability.

The default Connection implementation is still creating threads using
TaskCreationOptions.LongRunning.

You wrote: "Explictly starting threads outside the threadpool is generally
a good idea in something like asp.net where you are competing with the
server for the 200 max threadpool threads that are actually serving the
requests. Threadpool thread starvation is a real issue there."

I don't agree with you, if you use async the correct way there is no need
to allocate threads, not inside nor outside the threadpool (while waiting
on resources). There will be no threadpool starvation, or starvation of any
other threads for that matter.

Whether or not the Timeout property works in HttpWebRequest while
performing async calls I don't know. But maybe you could try the
System.Net.Http.HttpClient class instead? It's async all the way and the
API for doing HTTP calls is in many ways improved from HttpWebRequest. From
what I can see there are Timeout settings in the API.

Regarding the test results, maybe you should consider monitoring other
resources while running the performance comparison tests, how does the
thread and memory usage stand in comparison for the process?

Please consider once more the importance of removing the
TaskFactory.StartNew method from the Connection async implementation. Ask
other developers of their opinion!!

Generally I think it's a bad idea of maintaining two similar
implementations of the some code, and if the thread issue is solved then I
will gladly use the default implementation again.

Cheers,
Henrik

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/337#issuecomment-24481025
.

Mpdreamz · 2013-09-17T09:29:05Z

These are the current numbers (I'm running the NoTasksHttpConnection first)

HTTP (IndexManyAsync): 17.197/s 21 Threads 854593536 Virtual memory
HTTP (IndexManyAsyc using NoTasksHttpConnection): 16.469/s 21 Threads 854593536 Virtual memory

HTTP (IndexManyAsync): 16.988/s 21 Threads 854601728 Virual memory
HTTP (IndexManyAsyc using NoTasksHttpConnection): 16.968/s 21 Threads 854601728
Virual memory

These are 2 random outputs of many runs but the IndexManyAsync using the builtin connection was always faster (although not by much) for each run.

(this is on a VM agains ES 0.90.1)

I'm seeing the exact same Memory and Thread statistics as your implementation when running a concurrency profiler and inspecting the .cvtrace file.

Combined with the fact I think the task yielding approach reads a bit better then the Begin_/End_ pattern I'm opting for the removal of the NoTaskHttpConnection as you said maintaining both is not KISS/DRY whatever :)

Many many thanks for the slap on the wrist @henkish

…ng it alltogether instead of removing it so we keep on testing it in the future

Replaced async model with true async

4709561

Rewrote DoAsyncRequest to use real async web methods instead of starting new tasks and allocating thread pool resources for waits and callbacks.

Mpdreamz merged commit 4709561 into elastic:master Sep 9, 2013

Mpdreamz added a commit that referenced this pull request Sep 17, 2013

#337 Moved NoTaskHttpConnection to ProtocolLoadTest instead of removi…

ecee021

…ng it alltogether instead of removing it so we keep on testing it in the future

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaced async model with true async #337

Replaced async model with true async #337

henkish commented Sep 2, 2013

Mpdreamz commented Sep 2, 2013

henkish commented Sep 2, 2013

henkish commented Sep 2, 2013

Mpdreamz commented Sep 3, 2013

henkish commented Sep 3, 2013

Mpdreamz commented Sep 3, 2013

Mpdreamz commented Sep 9, 2013

henkish commented Sep 15, 2013

Mpdreamz commented Sep 15, 2013

Mpdreamz commented Sep 17, 2013

Replaced async model with true async #337

Replaced async model with true async #337

Conversation

henkish commented Sep 2, 2013

Mpdreamz commented Sep 2, 2013

henkish commented Sep 2, 2013

henkish commented Sep 2, 2013

Mpdreamz commented Sep 3, 2013

henkish commented Sep 3, 2013

Mpdreamz commented Sep 3, 2013

Mpdreamz commented Sep 9, 2013

henkish commented Sep 15, 2013

Mpdreamz commented Sep 15, 2013

Mpdreamz commented Sep 17, 2013