Skip to content

Remaining Memory leaks and application crash at more than 2000 tree entries #1001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dupeNoob opened this issue Dec 1, 2020 · 87 comments
Closed
Assignees
Labels
Milestone

Comments

@dupeNoob
Copy link

dupeNoob commented Dec 1, 2020

Hi.

When there are more than 2000 entries in the Virtual-TreeView and the window is closed on double click then the application crashes with memory leaks.

FastMM memory leaks:
5 - 12 bytes: TObject x 2
13 - 20 bytes: TList x 1, TThreadList x 1
21 - 28 bytes: Unbekannt x 1
37 - 44 bytes: TBaseVirtualTree.ChangeTreeStatesAsync$ActRec x 1
53 - 60 bytes: TWorkerThread x 1

Delphi Version: 10.2 (also in 10.4)
Virtual Tree View Version: 7.4

A small demo program is attached: VTBug.zip.

My assumption is that the TWorkerThread is performing something with the internal cache while the tree is destroyed.
The application doesn't crash at less than 2001 tree entries or when an Application.ProcessMessages is performed before the tree will be destroyed.

@joachimmarder
Copy link
Contributor

joachimmarder commented Dec 1, 2020

the application crashes with memory leaks.

We should focus in the crash first, the mem leaks are most like only a consequence of the crash.

But I don't see a crash here with latest Virtual TreeView source code and Delphi 10.4.1. I opened the project, pressed F9, pressed "Fill VT", double clicked the VT control. The application terminated without errors.

Can you send the call stack and exception type of the crash please?
Which version of Virtual TreeView are you using?

@joachimmarder joachimmarder added the Repro Steps Missing A smple project is needed to reprodcue and analyze the issue. See our guidelines for opening issues! label Dec 1, 2020
@antsiparovich
Copy link

antsiparovich commented Dec 2, 2020

We have same problem.
...
VirtualStringTree1.EndUpdate; // activate TWorkerThread with items more 2000
...
TBaseVirtualTreeCracker(lCurrentTree).ChangeTreeStatesAsync([tsValidating], [tsUseCache, tsValidationNeeded]); // execute TThread.Synchronize in ChangeTreeStatesAsync
Then switch to main thread, where VT is destroyed (see FormClose). After second call execute TBaseVirtualTreeCracker(lCurrentTree).ChangeTreeStatesAsync(EnterStates, [tsValidating, tsStopValidation]);
Thus sometimes we have AV

Delphi Version: 10.3 Update 1
Virtual Tree View Version: 7.3
Projects.zip

Stack from another programm

02.12.2020 19:25:32.004 [7488] OS exception occured: Access violation at address 0000000000000015 in module 'Console.exe'. Execution of address 0000000000000015
02.12.2020 19:25:32.005 [7488] (FFFFFFFFFFFFF015){Console.exe} [0000000000000015]
02.12.2020 19:25:32.005 [7488] [000000000041224E]{Console.exe} System.@DelphiExceptionHandler (Line 20257, "System.pas" + 53) + $D
02.12.2020 19:25:32.005 [7488] [00007FFB690C11CF]{ntdll.dll   } Unknown function at __chkstk + $11F
02.12.2020 19:25:32.005 [7488] [00007FFB6908A209]{ntdll.dll   } Unknown function at RtlRaiseException + $399
02.12.2020 19:25:32.005 [7488] [00007FFB690BFE3E]{ntdll.dll   } KiUserExceptionDispatcher + $2E
02.12.2020 19:25:32.005 [7488] (FFFFFFFFFFFFF015){Console.exe} [0000000000000015]
02.12.2020 19:25:32.101 [7308] EAccessViolation occured: Access violation at address 0000000000000015 in module 'Console.exe'. Execution of address 0000000000000015 at address $000000000054F611
02.12.2020 19:25:32.101 [7308] [0000000000412AF3]{Console.exe} System.@RaiseAtExcept (Line 21541, "System.pas" + 32) + $0
02.12.2020 19:25:32.101 [7308] [0000000000412B11]{Console.exe} System.@RaiseExcept (Line 21630, "System.pas" + 2) + $0
02.12.2020 19:25:32.101 [7308] [000000000054F611]{Console.exe} System.Classes.TThread.Synchronize (Line 15666, "System.Classes.pas" + 40) + $0
02.12.2020 19:25:32.101 [7308] [000000000054F7B0]{Console.exe} System.Classes.TThread.Synchronize (Line 15697, "System.Classes.pas" + 5) + $16
02.12.2020 19:25:32.101 [7308] [000000000118F094]{Console.exe} VirtualTrees.TBaseVirtualTree.ChangeTreeStatesAsync (Line 18407, "VirtualTrees.pas" + 3) + $1C
02.12.2020 19:25:32.101 [7308] [0000000001134C34]{Console.exe} VirtualTrees.WorkerThread.TWorkerThread.Execute (Line 177, "VirtualTrees.WorkerThread.pas" + 31) + $0
02.12.2020 19:25:32.101 [7308] [000000000054E673]{Console.exe} System.Classes.ThreadProc (Line 15048, "System.Classes.pas" + 18) + $E
02.12.2020 19:25:32.101 [7308] [00000000004136CD]{Console.exe} System.ThreadWrapper (Line 24888, "System.pas" + 9) + $7
02.12.2020 19:25:32.101 [7308] [00007FFB678E7BD4]{KERNEL32.DLL} BaseThreadInitThunk + $14
02.12.2020 19:25:32.101 [7308] [00007FFB6908CE51]{ntdll.dll   } RtlUserThreadStart + $21

@joachimmarder joachimmarder added Bug and removed Repro Steps Missing A smple project is needed to reprodcue and analyze the issue. See our guidelines for opening issues! labels Dec 3, 2020
@joachimmarder joachimmarder self-assigned this Dec 3, 2020
@joachimmarder joachimmarder added this to the V7.5 milestone Dec 3, 2020
@joachimmarder
Copy link
Contributor

I just committed a patch, could you please check if it solves the problem for you?

@Petz73
Copy link

Petz73 commented Dec 3, 2020

Hi Joachim,

Unfortunately the patch doesn't help. The problem still exists...
I think the Virtual Tree doesn't crash.
The crash is a result of the memory leaks and comes a little bit later in our program, because it's accesses the formerly freed memory of the VT objects.
And the VT Workerthread is still working and also accesses this freed memory...
With an Application.Processmessages (Delphi) before destroying the VT, I think, the WorkerThread has enought time (or can get an event?) to finish his work...

Henrik

@modersohn
Copy link

Hi,

out of interest, I had a look at the demo project too and could reproduce the crashes with FastMM4 and FullDebugMode.

From my understanding, it looks like .InterruptValidation which is called in TBaseVirtualTree.Destroy does not properly ensure that no validation code is execute anymore - the synchronized call in TBaseVirtualTree.ChangeTreeStatesAsync has not yet been processed and when it does later, the tree is already destroyed.

I was able to fix this by adding a call to CheckSynchronize after .InterruptValidation in TBaseVirtualTree.Destroy. I'm not sure if this is the right way to fix this, but with the current design (shared worker thread calling Trees via Synchronize) this seems to be required.

Let me know what you think.

Sebastian

@VladimirBartosh
Copy link

I just committed a patch, could you please check if it solves the problem for you?

It could not, because the code passed to Synchronize is executed asynchronusly, and if it happens when the tree has already been destroyed, Self becomes invalid and both FStates and HandleAllocated will fail, or even worse: you might got 'lucky' so that the memory Self points at will still be available, HandleAllocated and FStates will still contain some valid-looking values, the check will pass, but executing DoStateChange will fail, because object's memory will have been overwritten already by that time, with VMT reference leading to some random memory.

To solve this, you must ensure that no code like this is scheduled to run or already running at the time TBaseVirtualTree.Destroy executes

@modersohn
Copy link

@VladimirBartosh: hence my suggestion of adding CheckSynchronize - that should ensure that the asynchronously called code is executed before the destructor continues.

@VladimirBartosh
Copy link

CheckSynchronize can also be not enough. Imagine this:

  1. Thread 1(main): TWorkerThread.AddTree puts a tree to the worker list, and wakes up the worker thread
  2. Thread 2(worker): TWorkerThread.Execute gets the tree from list, unlocks the list, and is now performing the check in ChangeTreeStatesAsync:
    if (Self.HandleAllocated) then
  3. Thread 1: the tree is destoryed, with CheckSynchronize being run against an epmty synchronize list. Also InterruptValidation would not remove the tree from worker list, because it isn't there already, and TWorkerThread.ReleaseThreadReference(); would not prevent TWorkerThread from continuing execution past while not Terminated do, check, but would rather 'leak' it, setting its FreeOnTerminate value to True.
  4. Any thread reuses the former TBaseVirtualTree memory
  5. Thread 2: TBaseVirtualTreeCracker(lCurrentTree).ChangeTreeStatesAsync resumes, putting the already invalid Self pointer to synchronize list
  6. Some unpredictable, but most probably wrong things happen.

@modersohn
Copy link

Thanks for your input @VladimirBartosh - that is why I had my doubts about CheckSynchronize too.

The more I look at the code, the more it feels like the local lCurrentTree in TWorkerThread.Execute was added at a later stage and before that, it used FCurrentTree. Because if you look at TWorkerThread.RemoveTree, it calls WaitForValidationTermination, which also calls CheckSynchronize - but only if the FCurrentTree is removed. That now no longer happens because of the local lCurrentTree.

If I'm not mistaken that was introduced in #844 - so hopefully @joachimmarder can shed some light on why this was done.

@joachimmarder
Copy link
Contributor

Unfortunately Delphi lacks the concept of a cancellation token:, which would help here.
https://quality.embarcadero.com/browse/RSP-10044

The more I look at the code, the more it feels like the local lCurrentTree in TWorkerThread.Execute was added at a later stage
[...]
If I'm not mistaken that was introduced in #844 - so hopefully @joachimmarder can shed some light on why this was done

lCurrentTree was already there in the initial commit of the file in 2015. I don't see a connection to #844 here.
There is a comment which brings in relation to #434, which unfortunately was not moved from Google code for unknown reasons.

That now no longer happens because of the local lCurrentTree.

Sorry, but I don't see why. lCurrentTree is assigned to FCurrentTree at the moment this tree gets processed. And even if you make the assignment of nil the last line, the problem reported here persists.

joachimmarder added a commit that referenced this issue Dec 3, 2020
joachimmarder added a commit that referenced this issue Dec 3, 2020
* TWorkerThread.ReleaseThreadReference() calls CheckSynchronize() to Make sure code queued in the main thread by TBaseVirtualTree.ChangeTreeStatesAsync() get processed before the tree is being destroyed.
* Removed unused property CurrentTree from TWorkerThread.
@joachimmarder
Copy link
Contributor

Using CheckSynchronize() solved the problem here. The race condition above should not be an issue as the handle is already deallocated at this point.

@modersohn
Copy link

modersohn commented Dec 4, 2020

@joachimmarder, yes, you're right - lCurrentTree was already there before commit 0838277, but it was pretty much unused until then.

I'm trying to understand the intent the original code had. And RemoveTree clearly tries to ensure that the currently processed tree is finished by calling WaitForValidationTermination, which in turn calls the same CheckSynchronize you've now added to TWorkerThread.ReleaseThreadReference. (The extra condition that this is only done with toVariableNodeHeight was also added with the same commit)

The change in commit 0838277 as part of #844 is quite severe. Before that FCurrentTree was always the currently processed tree - now it is not anymore, because the local lCurrentTree is used. In fact Execute only sets FCurrentTree after it called ChangeTreeStatesAsync - so that CheckSynchronize call in WaitForValidationTermination won't help.

I can fix the crash of the test-case by @antsiparovich by changing the following things:

  1. Move the FCurrentTree assignment in Execute before the ChangeTreeStatesAsynch call
  2. Remove the if condition before the CheckSynchronize call in WaitForValidationTermination

Wouldn't you agree that this solution is easier to understand than your CheckSynchronize call in TWorkerThread.ReleaseThreadReference? In fact, if this is done, I think lCurrentTree can be removed completely, which would further simplify the code.

What do you think about adding this test-case to a Test project with DUnit or DUnitX? Sure it's not really a unit-test and you wouldn't be able to run these kind of tests from the command line, but I think it could help to avoid regressions in the future.

Last but not least: I don't understand why HandleAllocated is used as a sanity check (in your first commit for this issue but also in ChangeTreeStatesAsync). Is this still a leftover from back when window messages were used for synchronization? Because now with Synchronize I don't see why the window handle has any relevance.

@modersohn
Copy link

I was wondering why the test project from @antsiparovich was hanging on shutdown when used with FastMM and it turns out it wasn't hanging, but actually trying to report a leak if 10k UnicodeStrings.

The following code needs to be added to Unit3 to fix this:

procedure TForm3.VirtualStringTree1FreeNode(Sender: TBaseVirtualTree;
  Node: PVirtualNode);
var
  nodedata: PNodeData;
begin
  nodedata:= Sender.GetNodeData(Node);
  Finalize(nodedata^.item);
end;

@joachimmarder
Copy link
Contributor

Remove the if condition before the CheckSynchronize call in WaitForValidationTermination

CheckSynchronize() can cause odd call stacks and even side effects. So I use CheckSynchronize() only when absolutely necessary. That means, the condition should stay in my opinion.

Last but not least: I don't understand why HandleAllocated is used as a sanity check (in your first commit for this issue but also in ChangeTreeStatesAsync). Is this still a leftover from back when window messages were used for synchronization?

That may be possible. But I still think it is necessary to prevent the code to be executed in case the control is being destroyed. In a quick test with the project above I got AVs when I simply commented this line. We could of course something if csDestroying in ComponentState.

modersohn pushed a commit to modersohn/Virtual-TreeView that referenced this issue Dec 5, 2020
modersohn pushed a commit to modersohn/Virtual-TreeView that referenced this issue Dec 5, 2020
* TWorkerThread.ReleaseThreadReference() calls CheckSynchronize() to Make sure code queued in the main thread by TBaseVirtualTree.ChangeTreeStatesAsync() get processed before the tree is being destroyed.
* Removed unused property CurrentTree from TWorkerThread.
@modersohn
Copy link

modersohn commented Dec 5, 2020

Remove the if condition before the CheckSynchronize call in WaitForValidationTermination

CheckSynchronize() can cause odd call stacks and even side effects. So I use CheckSynchronize() only when absolutely necessary. That means, the condition should stay in my opinion.

The thing is, with that condition being left there, my fix of getting rid of lCurrentTree causes a hang, because WaitForValidationTermination loops forever (understandably).

Your fix OTOH is quite fragile - it calls CheckSynchronize only once - and there's absolutely no guarantee that it will finish all synchronized code with that Tree.

That may be possible. But I still think it is necessary to prevent the code to be executed in case the control is being destroyed. In a quick test with the project above I got AVs when I simply commented this line. We could of course something if csDestroying in ComponentState.

There is no way to find out if a control already has been destroyed, so it doesn't help much if you can find out if it's just in the process of being destroyed.

I'm pretty sure the reason why the HandleAllocated checks had been added back when, was not for checking for controls being destroyed - they were added as a safeguard before actually getting the Handle for window message synchronization - because of the obnoxious "Control '' has no parent window" exception.

And it's perfectly fine for a tree to not have a handle nor a parent window. Imagine having a DropDown component with a tree which you do all sorts of things with, before the user eventually drops down the control and the tree gets a parent, a window handle and finally becomes visible.

As the code is now, I agree with @VladimirBartosh: there are quite a few circumstances where the current WorkerThread code can possibly crash and IMHO the only way forward is to add test-cases, so you can gain a bit of confidence that new changes don't break things.

I've started a simple test-case based on the test project here and as soon as the worker thread is involved, crashes start to occur (because I'm repeating the test 100 times). The goal now is to get this running reliably. For anyone interested, I've checked in just the test-case on the develop/thread-tests branch of my fork (link below my comment).

modersohn pushed a commit to modersohn/Virtual-TreeView that referenced this issue Dec 5, 2020
@joachimmarder
Copy link
Contributor

so it doesn't help much if you can find out if it's just in the process of being destroyed.

This is not correct, it is important to ensure that no new method is queued in the main thread for this control if it is being destroyed, because CheckSynchronize is called in the context of the destructor.

Your fix OTOH is quite fragile - it calls CheckSynchronize only once - and there's absolutely no guarantee that it will finish all synchronized code with that Tree.

I don't think calling it multiple times helps here. It is important to call it at the right moment (the destructor) and to make sure that no new methods are queued for this control afterwards.

I'm pretty sure the reason why the HandleAllocated checks had been added back when, was not for checking for controls being destroyed - they were added as a safeguard before actually getting the Handle for window message synchronization

OK, I will change it to if not (csDestroying in ComponentState)

@joachimmarder
Copy link
Contributor

I've started a simple test-case based on the test project here and as soon as the worker thread is involved, crashes start to occur (because I'm repeating the test 100 times). The goal now is to get this running reliably. For anyone interested, I've checked in just the test-case on the develop/thread-tests branch of my fork (link below my comment).

Thanks for creating this. I added it to the official repository and ran the tests project several times, the tests always passed. Is your fork maybe missing some changes? Or has additional changes?

@modersohn
Copy link

Thanks for creating this. I added it to the official repository and ran the tests project several times, the tests always passed. Is your fork maybe missing some changes? Or has additional changes?

Unfortunately that's not the reason you're not seeing the crashes. In the mean-time I found out that for me, it only crashes on Win32 in the debugger. It does not crash when run without the debugger and it does not crash with Win64, regardless of the debugger. I was really puzzled by this and even started considering a bug in the debugger, but what I found very odd is that I see the exact same behavior both with RX 10.1 Berlin and RX 10.4 Sydney. And as soon as I remove the SetChildCount, i.e. avoid that any WorkerThread is started, the crashes go away with Win32 too.

So it seems I was bit too optimistic about adding reliable test-cases, but I have not given up yet. I'll definitely also check the good old DUnit GUI Runner and if all else fails will actually create and display the forms from the test case here.

@pyscripter
Copy link
Contributor

pyscripter commented Dec 29, 2020

The bigger question I wanted to ask is whether this threaded validation is worth the trouble.

  • Only used when VisibleCount > 2000 - How common that is? Is the count the total number of nodes in the tree or just the
    visible ones (currently shown)?
  • ThreadWorker.Execute often calls indirectly Synchronize and InterruptValidation with the loop containing Sleep and CheckSynchronize is costly. Do you have any benchmark or at least a feeling of the benefits of threaded validation on the user experience?

@modersohn
Copy link

I agree. To me it seems the best solution would be if CheckSynchronize() would retain the order of the queued elements and would dequeue only one element. If we agree here, I will open an issue at Embarcadero's Quality Portal.

Honestly, I didn't even consider this possibility, also because I think the chances are very high that Embarcadero will just consider this "as designed". If you want to take the chances and report it, I would highly recommend including a solid test-case, but even then, my hopes aren't high. Also I'd refrain from suggesting how this should be fixed and leave it up to them, but that is just my personal experience with bug reporting.

@modersohn
Copy link

  • Only used when VisibleCount > 2000 - How common that is? Is the count the total number of nodes in the tree or just the
    visible ones (currently shown)?

In ValidateCache, CacheThreshold is compared to FVisibleCount, so yes, only the ones currently shown.

  • ThreadWorker.Execute often calls indirectly Synchronize and InterruptValidation with the loop containing Sleep and CheckSynchronize is costly. Do you have any benchmark or at least a feeling of the benefits of threaded validation on the user experience?

This is not about absolute objective performance, but about subjective performance for the user. If you manipulate a large tree while positioned at the top, there is little that needs to be done to be able to update the GUI and let normal operation continue.

The positional cache is necessary to speed up things like vertical scrolling and hit-testing, which gets even more complicated, if variable node height is involved. Imagine a tree with a million visible nodes and you're positioned at the very end. Just adding a single node makes it necessary to iterate through all of them to determine the exact position of the last node. If you'd do this synchronously, the tree would feel very sluggish.

If you want to see this in action, have a look at the Advanced Demo and enable state tracking in the secondary window (this is also a demonstration why state change events need to be synchronized).

Can we do somethiing similar for ChangeTreeStatesAsync to avoid using synchronize?

Even if we could, that doesn't solve the problem. The validation of the position cache is walking the tree nodes in a separate thread. It has to be aborted before any changes are made to the nodes, otherwise crashes and undetermined behavior are guaranteed to happen, even if not always reproducible.

joachimmarder added a commit that referenced this issue Dec 29, 2020
…deadlock in case CheckSynchronize() is called nested.
joachimmarder added a commit that referenced this issue Dec 29, 2020
* Partial fix for issue #1013: Changing the Check State from InitNode may leave the tree in "updating" state.
joachimmarder added a commit that referenced this issue Dec 29, 2020
…onized calls from ChangeTreeStatesAsync() in the queue.
@joachimmarder
Copy link
Contributor

so yes, only the ones currently shown.

Just to avoid a misunderstanding here: This does not mean the nodes visible in the current viewport, but all nodes that are not in collapsed branches. So it's not unlikely.

Can we do somethiing similar for ChangeTreeStatesAsync to avoid using synchronize?

That's unfortunately not sufficient, as @modersohn wrote earlier. And I also agree with him that this code makes sense.

I don't think it would be an overall benefit to remove this code. For what it's worth: With the latest code I am unable to replicate any of the AVs or mem leaks reported above.

@modersohn
Copy link

For what it's worth: With the latest code I am unable to replicate any of the AVs or mem leaks reported above.

Yes, but the fixes feel very circumventional (i.e. 2 calls to CheckSynchronize because WorkerThread.Execute calls Synchronize twice - an assumption that is not necessarily true all the time etc.).

But I understand you need a workable solution right now - for a long term solution, I'd like to invite you all to post some feedback to the proposed thread-less alternative I described in #1014

@pyscripter
Copy link
Contributor

pyscripter commented Dec 30, 2020

The catch22 is that the WorkerThread is using Synchronize indirectly in two places:

  • ChangeTreeStatesAsync
  • MeasureItemHeight for variable height trees

This is why InterruptValidation, called from the main thread has to use CheckSynchronize and the issues that this has.

The solution is to avoid using Synchronize. Then, InterruptValidation can do a standard wait using some simple synchronization mechanism.

ChangeTreeStatesAsync can be changed to not use Synhronize as indicated above. It can use the suggested lock-free mechanism and UpdateEditBounds is not an issue since it does nothing when called from a thread different than main.

So the only problem is MeasureItemHeight . I have not looked into this, but you use the WorkerThread only for trees with uniform height.

If you agree that this is the right direction and I can submit a PR.

@modersohn
Copy link

The solution is to avoid using Synchronize. Then, InterruptValidation can do a standard wait using some simple synchronization mechanism.

Your idea certainly has merit, but the task of completely avoiding Synchronize is not easy to achieve (see below).

ChangeTreeStatesAsync can be changed to not use Synhronize as indicated above.

While your proposed code would be able to change FState in a thread-safe manner, it would no longer call DoStateChange, therefore not firing the OnStateChange event (and not doing the Assert there).

It can use the suggested lock-free mechanism and UpdateEditBounds is not an issue since it does nothing when called from a thread different than main.

The fact that UpdateEditBounds does nothing when called from a different thread is just a safety measure. But the edit position needs to be updated with the new position information once the position cache is updated.

So the only problem is MeasureItemHeight . I have not looked into this, but you use the WorkerThread only for trees with uniform height.

One could certainly disable asynchronous validation when toVariableNodeHeight is set, but that is a limitation that doesn't exist right now and I'm not sure how common that would be.

Let's see if and how we can work out the remaining issues.

@modersohn
Copy link

Oh and there's one case for Synchronize that I forgot: how do you do exception handling in the thread without Synchronize, given the fact that you a) want the thread to carry on working after an exception and b) it might be released with FreeOnTerminate = True, meaning there's nobody who could re-raise TThread.FatalException in the main thread?

@pyscripter
Copy link
Contributor

how do you do exception handling in the thread without Synchronize,

You can always shut the thread down raise the Exception and start a new WorkerThread when needed, or use Queue instead of synchronize.

In fact this is another change I want to see implemented: Use queue instead of synchronize in the WorkerThread exception handling.

@modersohn
Copy link

You can always shut the thread down raise the Exception and start a new WorkerThread when needed, or use Queue instead of synchronize.

Who would shut it down and raise the exception? Imagine the thread already has a list of several trees waiting for validation - how would you be able to manage all of this? I can't see how.

In fact this is another change I want to see implemented: Use queue instead of synchronize in the WorkerThread exception handling.

Again, that would not solve the synchronization problems, I think. The exception is acquired to a local variable in Execute and then captured for the anonymous method passed to Synchronize. How would this work if Queue is used instead and by the time someone processes this with CheckSynchronize, the thread has long been gone?

@pyscripter
Copy link
Contributor

pyscripter commented Dec 30, 2020

Who would shut it down and raise the exception? Imagine the thread already has a list of several trees waiting for validation - how would you be able to manage all of this? I can't see how.

In this case the list of trees would be stored inside a global thread-safe list. The Terminate event could schedule the start of a new worker thread.

Again, that would not solve the synchronization problems, I think. The exception is acquired to a local variable in Execute and then captured for the anonymous method passed to Synchronize. How would this work if Queue is used instead and by the time someone processes this with CheckSynchronize, the thread has long been gone?

This is if, as now, the WorkerThread stays around till shutdown.

t would no longer call DoStateChange, therefore not firing the OnStateChange

This however is a problem but may be acceptable since the only states that ChangeTreeStatesAsync changes have to do with validation and could be considered as internal and of no interest to the user. You also could use a different enumeration TValidationState or TCacheState, that is stored in a separate field instead of including them in State.

@modersohn
Copy link

The Terminate event could schedule the start of a new worker thread.

The OnTerminate event is called with Synchronize, so we're back to square one.

The exception handling with Queue might actually work if the compiler is smart enough to capture the exception correctly even when the thread has already been freed. I'm going to try that later with a test-case and memory leak detection enabled.

This however is a problem but may be acceptable since the only states that ChangeTreeStatesAsync changes have to do with validation and could be considered as internal and of no interest to the user. You also could use a different enumeration TValidationState or TCacheState, that is stored in a separate field instead of including them in State.

That is certainly something to consider, but it does change the current behavior in quite a substantial way - since you never know who uses the OnStateChange event for whatever reason. Therefore renaming and moving the state to a separate enum is certainly the preferable approach as it would intentionally break any code out there that does work with these states.

@pyscripter
Copy link
Contributor

pyscripter commented Dec 30, 2020

The OnTerminate event is called with Synchronize, so we're back to square one.

But there is no need to wait for that. You could instead start a new FreeOnTerminate WorkerThread before exiting the existing one.

The exception handling with Queue might actually work if the compiler is smart enough to capture the exception correctly even when the thread has already been freed.

You could instead store the Exception in a private field of the Tree. and raise it in the queued anonymous method. It is important to avoid using synchronize.

@joachimmarder
Copy link
Contributor

I would like to close this issue in case none of the above issues can be reproduced anymore. I don't claim that this is a good solution, but it is at least some solution. Please continue discussions about alternative designs in issue #1014.

@obones
Copy link
Contributor

obones commented May 4, 2021

I'm sorry to be the bringer of bad news, but I found about this issue because we are facing its exact consequences, despite having updated to today's git content.
What I have found, though, is that the code in the destructor only needs a little alteration to fix our issue. Currently, the code is like this:

  WasValidating := (tsValidating in FStates);
  InterruptValidation(True);
  if WasValidating then
  begin
    // Make sure we dequeue the two synchronized calls from ChangeTreeStatesAsync(), fixes mem leak and AV reported in issue #1001, but is more a workaround.
    CheckSynchronize();
    CheckSynchronize();
  end;// if

Our fix is to change it to this:

  WasValidating := ([tsValidating, tsValidationNeeded] * FStates <> []);
  InterruptValidation(True);
  if WasValidating then // Make sure we dequeue the synchronized calls from ChangeTreeStatesAsync(), to avoid mem leak and AV reported in issue #1001
    while CheckSynchronize() do
      Sleep(1);

First, we discovered that our our VTV instance was in the tsStopValidation, tsValidationNeeded state, not the tsValidating one. Maybe it's the presence of tsStopValidation that should be checked along tsValidating, I'm not sure.
Second, the CheckSynchronize method returns True if it has processed a method call. It is much safer to continue calling it while it returns True because there might be many other calls from outside the VTV pending in the queue. Doing a very short Sleep in the loop is here to ensure we yield to the other threads.

With this, the issue is fixed for good.

@joachimmarder
Copy link
Contributor

First, we discovered that our our VTV instance was in the tsStopValidation, tsValidationNeeded state, not the tsValidating one.

Are you sure that the TWorkerThread was blocked in a ChangeTreeStatesAsync()? I currently cannot imagine how this could happen. The whole thing is done to ensure that the tsValidating flag correctly and thread-safe indicates if the tree is being validated.

Second, the CheckSynchronize method returns True if it has processed a method call. It is much safer to continue calling it while it returns True because there might be many other calls from outside the VTV pending in the queue. Doing a very short Sleep in the loop is here to ensure we yield to the other threads.

Yes, this makes a lot of sense, and I am committing this code change now. One idea would be to do this as long as the tsValidating flag is set. Thanks for submitting this idea.

joachimmarder added a commit that referenced this issue May 5, 2021
@obones
Copy link
Contributor

obones commented May 5, 2021

Are you sure that the TWorkerThread was blocked in a ChangeTreeStatesAsync()? I currently cannot imagine how this could happen. The whole thing is done to ensure that the tsValidating flag correctly and thread-safe indicates if the tree is being validated.

At the time the destructor is called, TWorkerThread is blocked waiting for the Synchronize call inside ChangeTreeStatesAsync() to return. And when this happens, the VTV instance has its state property set to [tsStopValidation, tsValidationNeeded]. With the current code, the calls to CheckSynchronize are never done and seconds later I get an access violation inside the anonymous method of ChangeTreeStatesAsync() right on the call to DoStateChange because the VTV has already been destroyed.

Note that the blocked call to ChangeTreeStatesAsync() originates from TWorkerThread.Execute and more precisely the first one that does this:

TBaseVirtualTreeCracker(lCurrentTree).ChangeTreeStatesAsync([tsValidating], [tsUseCache, tsValidationNeeded]);

So, at the time the destructor is called, tsValidating has not yet been added to the states of the VTV instance because the worker thread is waiting for our main thread to process the Synchronize calls. This also means that looping while tsValidating is in FStates would not loop at all because it will only appear in FStates after the first Synchronize call has had a chance to occur.

Note that out of tsStopValidation and tsValidationNeeded that are both present in FStates, I decided to settle on tsValidationNeeded for my change to WasValidating value, but it would work just as well with tsStopValidation. This is where I have a doubt and I believe you have a better understanding of the various flags and the more appropriate one to use on top of tsValidating which clearly is not enough on its own.

joachimmarder added a commit that referenced this issue May 5, 2021
…troy() is not enough, the TWorkerThread may be stuck in the first call to ChangeTreeStatesAsync() where tsValidating is not set.
@joachimmarder
Copy link
Contributor

Note that the blocked call to ChangeTreeStatesAsync() originates from TWorkerThread.Execute and more precisely the first one that does this:

OK, yes, although it seems unlikely, the TWorkerThread may of course get stuck here in the first call to ChangeTreeStatesAsync(). I changed TBaseVirtualTree.Destroy() accordingly.

@joachimmarder joachimmarder added this to the V7.6 milestone May 5, 2021
@joachimmarder joachimmarder removed In Development Open for Discussion There are several possibilites to address the issue and anyone is invited for comments. labels May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants