-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[BUG] Possible Bug when training causes a "never-ending-training" when VS-F5 Debugging/attached but works OK without debugger attached #2099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I added .AppendCacheCheckpoint(mlContext); right after https://github.com/dotnet/machinelearning-samples/blob/09migration/samples/csharp/getting-started/MulticlassClassification_Iris/IrisClassification/IrisClassificationConsoleApp/Program.cs#L63
It's a workaround. Should be able to train model in VS without caching, though. |
Yep!. That's a nice workaround, but still, we need to fix this issue/bug. 👍 |
I think the bug can be in VS Debugging but ML.NET v0.9 might be simply triggering/surfacing the bug..
|
It looks like we removed the "auto-cache" mechanism in So that's why this issue didn't repro in I assume something funky is going on with the debugger - and we may need the VS debugger team to investigate. /cc @TomFinley @wschin |
My guess, as @TomFinley quoted in #2137, is that this is actually due (at least partially) to the hundreds of thousands of exceptions getting thrown without the caching. |
Eh, my guess contributed, but it's definitely not the whole cause. |
Ok, @danmosemsft's guess is also a significant contributor. This demo is also creating ~110K threads (to go along with the ~110K exceptions I was seeing). That's... not good. |
This should be reopened. #2138 contributed but definitely doesn't fully address it. I don't have permissions to reopen it seems. |
(Someone who understands the architecture and algorithms better than I do should tackle the overall threading issue, but I'm putting together a stop-gap solution that should help significantly for this particular issue.) |
Ok, so there are a ton of various costs here, and we can eliminate them piecemeal, but the root issue here is this demo is creating 15,755 LineReaders! That means 15,755 files being opened, 15,755 * 7 (on my machine) threads being created, 15,755 BlockingCollections being created, 15,755 OperationCanceledExceptions getting caught/thrown inside BlockingCollection to wake up in response to CompleteAdding (https://github.com/dotnet/corefx/issues/34602), 15,755 InvalidOperationExceptions from Take() (addressed in #2138), etc. These are very real costs regardless of whether the debugger is attached, but the debugger significantly amplifies the expense of some of these, e.g. exceptions and threads. That's a ton of overhead, and while we can try to address these issues piecemeal, this explosion of LineReaders (and whatever higher in the stack is creating them) seems like the real problem to address. Presumably this is all to do with the lack of caching that @eerhardt called out. |
Just to mention that this issue is happening in v0.10 not just in MultiClassClassification models but also in Regression models, such as the BikeSharingDemand sample. |
cc @adamsitnik in case he is interested. |
This is resolved in previous PR |
This was working properly in ML.NET v0.8, so looks like a bug appearing in v0.9?
CONTEXT:
I just migrated the Iris-MultiClassClassification sample to ML.NET v0.9.
Code is here, ONLY in branch 09migration:
https://github.com/dotnet/machinelearning-samples/tree/09migration/samples/csharp/getting-started/MulticlassClassification_Iris
PROBLEM:
If I run it without attaching to debugger such as with Ctrl+F5, it works/trains properly:
However, if I run the console app with F5 (Debugging with VS attached to the process), then the app is "training" the model "forever", meaining that instead of 20 seconds I've been waiting for more than 30 minutes and it "never ends"..
I talked to @danmosemsft and we think it can be related to any issue with too many threads being created?
In the debug window we see a bunch of threads being created, not sure if that's the cause? Too many threads being created and that's causing the issue to Visual Studio?
The thread 0x3ab4 has exited with code 0 (0x0).
The thread 0x5038 has exited with code 0 (0x0).
The thread 0x2994 has exited with code 0 (0x0).
The thread 0x4b18 has exited with code 0 (0x0).
The thread 0xf3c has exited with code 0 (0x0).
The thread 0x4e8 has exited with code 0 (0x0).
The thread 0x3cd0 has exited with code 0 (0x0).
The thread 0x28d4 has exited with code 0 (0x0).
The thread 0x56cc has exited with code 0 (0x0).
The thread 0x38d0 has exited with code 0 (0x0).
The thread 0x658 has exited with code 0 (0x0).
The thread 0x17d4 has exited with code 0 (0x0).
The thread 0x3ef0 has exited with code 0 (0x0).
The thread 0x6284 has exited with code 0 (0x0).
The thread 0x1664 has exited with code 0 (0x0).
The thread 0x2c5c has exited with code 0 (0x0).
The thread 0x536c has exited with code 0 (0x0).
The thread 0x1e04 has exited with code 0 (0x0).
The thread 0x5014 has exited with code 0 (0x0).
The thread 0x4f4c has exited with code 0 (0x0).
The thread 0x3444 has exited with code 0 (0x0).
The thread 0x670 has exited with code 0 (0x0).
The thread 0x42f0 has exited with code 0 (0x0).
The thread 0x2750 has exited with code 0 (0x0).
The thread 0x55e8 has exited with code 0 (0x0).
The thread 0x30a0 has exited with code 0 (0x0).
The thread 0x4810 has exited with code 0 (0x0).
The thread 0x4e7c has exited with code 0 (0x0).
The thread 0x6450 has exited with code 0 (0x0).
The thread 0x2a0 has exited with code 0 (0x0).
The thread 0x38d8 has exited with code 0 (0x0).
The thread 0x4708 has exited with code 0 (0x0).
The thread 0x6dc has exited with code 0 (0x0).
The thread 0x1880 has exited with code 0 (0x0).
The thread 0x4ffc has exited with code 0 (0x0).
The thread 0xd04 has exited with code 0 (0x0).
The thread 0x3ff8 has exited with code 0 (0x0).
The thread 0x6328 has exited with code 0 (0x0).
The thread 0x5030 has exited with code 0 (0x0).
The thread 0x6438 has exited with code 0 (0x0).
The thread 0x44a8 has exited with code 0 (0x0).
The thread 0x1eb8 has exited with code 0 (0x0).
The thread 0x6034 has exited with code 0 (0x0).
The thread 0x3f64 has exited with code 0 (0x0).
The thread 0x5788 has exited with code 0 (0x0).
The thread 0x1420 has exited with code 0 (0x0).
The thread 0x3ed4 has exited with code 0 (0x0).
The thread 0x66e4 has exited with code 0 (0x0).
The thread 0x30b4 has exited with code 0 (0x0).
The thread 0x2794 has exited with code 0 (0x0).
The thread 0x42d0 has exited with code 0 (0x0).
The thread 0x6504 has exited with code 0 (0x0).
The thread 0x2a90 has exited with code 0 (0x0).
The thread 0x4bfc has exited with code 0 (0x0).
The thread 0x1470 has exited with code 0 (0x0).
The thread 0x5e2c has exited with code 0 (0x0).
The thread 0x6580 has exited with code 0 (0x0).
The thread 0x6200 has exited with code 0 (0x0).
The thread 0x42d4 has exited with code 0 (0x0).
The thread 0x211c has exited with code 0 (0x0).
The thread 0x2724 has exited with code 0 (0x0).
The thread 0x20 has exited with code 0 (0x0).
The thread 0x30c has exited with code 0 (0x0).
The thread 0x28a8 has exited with code 0 (0x0).
The thread 0xba8 has exited with code 0 (0x0).
The thread 0x2810 has exited with code 0 (0x0).
The thread 0x64f0 has exited with code 0 (0x0).
The thread 0x5dd0 has exited with code 0 (0x0).
The thread 0x2fa8 has exited with code 0 (0x0).
The thread 0x57c8 has exited with code 0 (0x0).
The thread 0x72c has exited with code 0 (0x0).
The thread 0x5124 has exited with code 0 (0x0).
The thread 0x6310 has exited with code 0 (0x0).
The thread 0x3344 has exited with code 0 (0x0).
The thread 0x5fd4 has exited with code 0 (0x0).
The thread 0x35c has exited with code 0 (0x0).
The thread 0x435c has exited with code 0 (0x0).
The thread 0x6318 has exited with code 0 (0x0).
The thread 0x2838 has exited with code 0 (0x0).
The thread 0x2124 has exited with code 0 (0x0).
The thread 0x4a8 has exited with code 0 (0x0).
The thread 0x48a0 has exited with code 0 (0x0).
The thread 0x1f24 has exited with code 0 (0x0).
The thread 0x2334 has exited with code 0 (0x0).
The thread 0x4dd8 has exited with code 0 (0x0).
The thread 0x6330 has exited with code 0 (0x0).
The thread 0x49d8 has exited with code 0 (0x0).
The thread 0x4e10 has exited with code 0 (0x0).
The thread 0x22a8 has exited with code 0 (0x0).
The thread 0x673c has exited with code 0 (0x0).
The thread 0x3498 has exited with code 0 (0x0).
The thread 0x4fac has exited with code 0 (0x0).
The thread 0x5594 has exited with code 0 (0x0).
The thread 0x63a0 has exited with code 0 (0x0).
The thread 0x4584 has exited with code 0 (0x0).
The thread 0x2480 has exited with code 0 (0x0).
The thread 0x4afc has exited with code 0 (0x0).
The thread 0x5fe8 has exited with code 0 (0x0).
The thread 0x1908 has exited with code 0 (0x0).
The thread 0x70c has exited with code 0 (0x0).
The thread 0x46e0 has exited with code 0 (0x0).
The thread 0x33bc has exited with code 0 (0x0).
The thread 0x1d2c has exited with code 0 (0x0).
The thread 0x1ea8 has exited with code 0 (0x0).
The thread 0x1d84 has exited with code 0 (0x0).
The thread 0x4540 has exited with code 0 (0x0).
The thread 0x6634 has exited with code 0 (0x0).
The thread 0x4b24 has exited with code 0 (0x0).
The thread 0x315c has exited with code 0 (0x0).
The thread 0x2844 has exited with code 0 (0x0).
The thread 0x4c20 has exited with code 0 (0x0).
The thread 0x65f4 has exited with code 0 (0x0).
The thread 0x252c has exited with code 0 (0x0).
The thread 0x6484 has exited with code 0 (0x0).
The thread 0x4da8 has exited with code 0 (0x0).
The thread 0x4f9c has exited with code 0 (0x0).
The thread 0x56a0 has exited with code 0 (0x0).
The thread 0x5d3c has exited with code 0 (0x0).
The thread 0x519c has exited with code 0 (0x0).
The thread 0x1ae8 has exited with code 0 (0x0).
The thread 0x6234 has exited with code 0 (0x0).
The thread 0x90c has exited with code 0 (0x0).
The thread 0x4428 has exited with code 0 (0x0).
The thread 0x4310 has exited with code 0 (0x0).
The thread 0x3230 has exited with code 0 (0x0).
The thread 0x60d8 has exited with code 0 (0x0).
The thread 0x3288 has exited with code 0 (0x0).
The thread 0x1d28 has exited with code 0 (0x0).
The thread 0x5f80 has exited with code 0 (0x0).
The thread 0x216c has exited with code 0 (0x0).
The thread 0x3774 has exited with code 0 (0x0).
The thread 0x5b94 has exited with code 0 (0x0).
The thread 0x4c04 has exited with code 0 (0x0).
The thread 0x2c80 has exited with code 0 (0x0).
The thread 0x38c4 has exited with code 0 (0x0).
The thread 0x13dc has exited with code 0 (0x0).
The thread 0x6100 has exited with code 0 (0x0).
The thread 0xc60 has exited with code 0 (0x0).
The thread 0x5d40 has exited with code 0 (0x0).
The thread 0x5448 has exited with code 0 (0x0).
The thread 0x50bc has exited with code 0 (0x0).
The thread 0x16f8 has exited with code 0 (0x0).
The thread 0x381c has exited with code 0 (0x0).
The thread 0x2c84 has exited with code 0 (0x0).
The thread 0x5068 has exited with code 0 (0x0).
The thread 0x33f8 has exited with code 0 (0x0).
The thread 0xc98 has exited with code 0 (0x0).
The thread 0x598 has exited with code 0 (0x0).
The thread 0x5ee4 has exited with code 0 (0x0).
The thread 0x2438 has exited with code 0 (0x0).
The thread 0x4e30 has exited with code 0 (0x0).
The thread 0x1d24 has exited with code 0 (0x0).
The thread 0x443c has exited with code 0 (0x0).
The thread 0x3e98 has exited with code 0 (0x0).
The thread 0x5c60 has exited with code 0 (0x0).
The thread 0x23f0 has exited with code 0 (0x0).
The thread 0x2eec has exited with code 0 (0x0).
The thread 0x3a74 has exited with code 0 (0x0).
The thread 0x560 has exited with code 0 (0x0).
The thread 0x2088 has exited with code 0 (0x0).
The thread 0x2b04 has exited with code 0 (0x0).
The thread 0x129c has exited with code 0 (0x0).
The thread 0x3f18 has exited with code 0 (0x0).
The thread 0x627c has exited with code 0 (0x0).
The thread 0x33e0 has exited with code 0 (0x0).
The thread 0x4010 has exited with code 0 (0x0).
The thread 0x1f48 has exited with code 0 (0x0).
The thread 0x2374 has exited with code 0 (0x0).
The thread 0x4598 has exited with code 0 (0x0).
The thread 0x5c64 has exited with code 0 (0x0).
The thread 0x4a40 has exited with code 0 (0x0).
The thread 0x8d4 has exited with code 0 (0x0).
The thread 0x3ee8 has exited with code 0 (0x0).
The thread 0x161c has exited with code 0 (0x0).
The thread 0x1818 has exited with code 0 (0x0).
The thread 0x2b9c has exited with code 0 (0x0).
The thread 0x376c has exited with code 0 (0x0).
The thread 0x4ee8 has exited with code 0 (0x0).
The thread 0x1894 has exited with code 0 (0x0).
The thread 0x5e80 has exited with code 0 (0x0).
The thread 0x6420 has exited with code 0 (0x0).
The thread 0x55d8 has exited with code 0 (0x0).
The thread 0x47f0 has exited with code 0 (0x0).
The thread 0x1e8c has exited with code 0 (0x0).
The thread 0x5c28 has exited with code 0 (0x0).
The thread 0x212c has exited with code 0 (0x0).
The thread 0x3998 has exited with code 0 (0x0).
The thread 0x4914 has exited with code 0 (0x0).
The thread 0x2ce0 has exited with code 0 (0x0).
The thread 0x2544 has exited with code 0 (0x0).
The thread 0x4570 has exited with code 0 (0x0).
The thread 0x6040 has exited with code 0 (0x0).
The thread 0x8a8 has exited with code 0 (0x0).
The thread 0x2a7c has exited with code 0 (0x0).
The thread 0x54a8 has exited with code 0 (0x0).
The thread 0x2e3c has exited with code 0 (0x0).
The thread 0x38d4 has exited with code 0 (0x0).
The thread 0x6538 has exited with code 0 (0x0).
The thread 0x65b8 has exited with code 0 (0x0).
The thread 0x2b88 has exited with code 0 (0x0).
The thread 0x21b0 has exited with code 0 (0x0).
The thread 0x5bf4 has exited with code 0 (0x0).
The thread 0x5d2c has exited with code 0 (0x0).
The thread 0x5650 has exited with code 0 (0x0).
The thread 0x4398 has exited with code 0 (0x0).
The thread 0x243c has exited with code 0 (0x0).
The thread 0x4f60 has exited with code 0 (0x0).
The thread 0x1874 has exited with code 0 (0x0).
The thread 0x26d0 has exited with code 0 (0x0).
The thread 0x1a34 has exited with code 0 (0x0).
The thread 0x3d84 has exited with code 0 (0x0).
The thread 0x5140 has exited with code 0 (0x0).
The thread 0x3170 has exited with code 0 (0x0).
The thread 0x4608 has exited with code 0 (0x0).
The thread 0x356c has exited with code 0 (0x0).
The thread 0x2758 has exited with code 0 (0x0).
The thread 0x4c4 has exited with code 0 (0x0).
The thread 0x5a04 has exited with code 0 (0x0).
The thread 0xe6c has exited with code 0 (0x0).
The thread 0x4348 has exited with code 0 (0x0).
This issue or possible bug needs to be researched as soon as possible, please. 👍
The text was updated successfully, but these errors were encountered: