-
Notifications
You must be signed in to change notification settings - Fork 897
Streaming Filter Support #1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Filter Support #1030
Conversation
[MarshalAs(UnmanagedType.FunctionPtr)] | ||
public free_fn free; | ||
|
||
public delegate int write_fn(IntPtr stream, IntPtr buffer, uint len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rather marshal size_t
into a UIntPtr
I think this looks great. Thanks @whoisj |
{ | ||
Ensure.ArgumentNotNullOrEmptyString(filterName, "filterName"); | ||
|
||
if (!filterName.Contains(AttributeFilterDefinition)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Contains(...)
the right check here? I was kinda expecting a StartsWith(...)
with an explicit string comparer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll change this to be .StartsWith(..., Ordinal)
👀 will test this out against GHfW |
@jamill Yes, agreed - literals (aka magic numbers) should be moved into constants. My bad, tired brains take shortcuts. I'll fix this up ASAP. As for the exception, I'm using it as flow control. I suppose I should use something cheaper like a In general, is it better in a native-managed-native call stack to just avoid exceptions all together? Every project seems to have its own set of rules about this. I don't see anything in the .md files about this. What is expected in Libgit2Sharp? /cc @nulltoken , @ethomson , @shiftkey |
Thanks for all of the great feedback. Fixes pushed for all with the exception of the discussion @ethomson and @jamill are having with regards to requiring "filter=". I'm not sure if that should be part of this PR or not. I agree with the explicit requirement, not sure what kinds of errors it should produce, etc. |
I think @ethomson and I are in agreement on always adding the |
@whoisj this is wonderful! Thank you so much for taking this on, I have been swamped. 😍 We ended up making a fork of libgit2sharp without the api changes in libgit2, which GitHub for Windows is currently using. https://github.com/github/libgit2sharp/tree/ghfw-p-flow-plus-filters It includes one change that didn't make it into the original libgit2sharp PR github@dbea75e which allows us to pass the working directory of the repository to the clean and smudge overloads on the filter. This turns out to be important because most filter CLI tools (or at least the Git LFS CLI) require the working directory of the repository. Do you have any objections to adding this to the API contract? |
@ammeep I think that makes sense, no objections here. |
Updated the PR with changes requested by @ammeep as well as added a necessary, missing callback on filter close to allow the filter to do any final, necessary work before being closed. |
Just ran this through our existing test suite in GitHub for Windows - I'm 👍 on this PR. Thanks for the hard work @whoisj! |
Dammit! The PR isn't mergeable any longer. Could you please rebase it? |
Added complete callback to facilitate long running or transforming filters like Git-LFS. Added repository's working directory to callbacks to support extern process like Git-LFS, where knowledge of a repository's root maybe necissary but unobtainable in another fasion. Removed unused NativeMethods
The filters are run in-line with the rest of the system, and libgit2 is set up to have its last filter/writestream write to the filesystem. Instead of using an intermediate buffer/file/pipe, we can make the writes from the user's Smude() or Clean() filter into their output stream propagate immediately into the next filter, leading up the chain to the filesystem, removing any intermediate buffering other than that which is necessary for the filters themselves to work. In the general case, we only have the one clean or smudge filter, which means that even in the case of out-of-band storage for large files, where a small blob can cause very large outputs, we would still write directly to the filesystem. by J Wyman (@whoisj): Buffered writes to next filter to minimize managed to native marshaling for performance optimizations by J Wyman (@whoisj): Split `WriteStream` into its own file and cleaned it up (checks, style, etc)
@nulltoken oh that was a fun merge (in the text sense) 😑 |
🎉 🎈 🍻 😌 |
Epic. |
🎉 Nice work @whoisj!! |
@whoisj This AppVeyor job shows some strange failures related this PR. Any idea?
|
Some more detail in this other one
|
@nulltoken it's very possible something went wrong in the series of rebases the PR took just before merge. I'm not able to reproduce these failures locally. Do they only happen in AppVeyor or can you reproduce them locally? |
I'm not either. Weirder, see https://ci.appveyor.com/project/libgit2/libgit2sharp/history
Just restarted #859 build to see if this is consistent (although I'm not sure to understand why). Maybe AppVeyor prefers @ethomson's code over @jamill's code. |
Very strange. #859 is now passing. This kind of random issues makes me feel very uneasy especially when the tests do not rely on external dependencies (eg. network). |
@nulltoken I see trancient test failures commonly in the This has been happening since before I started on the project and since they were always transient and always resolved by a re-run I ignored them. |
@whoisj Wow. That is very weird. I've never seen With regards to your failures, are you still running the tests against a RAM disk? Could we be experiencing IO concurrency issues? |
@nulltoken I am using the RAM disk locally, but the transient failures have been happening since before I switched from SSD to RAM. I do not think they're related. One piece of information I do have is it feels like it's more common to see transient failures immediately after building. In fact, I do not believe I've ever seen them when using an existing DLL. JIT issues? |
Regarding #1030 (comment) @jeffhostetler was hinting at a possible GC related reclaim. |
CI |
One of those errors are from writing to the next stream, which sets its own error message. It should provide more information about the underlying cause if |
Completes the work started by @ammeep for getting filters working in Libgit2Sharp (thanks for the awesome starting point!). Makes use of the streaming filter infrastructure added by @ethomson in Libgit2 to minimize the memory footprint. The goal is to support the Git-LFS filter, so one has to assume the streams won't fit in memory.
Fairly simple design. Added a new
GitWriteStream
to wrap the nativegit_writestream
, then implements all logic for doing the actual stream work in theFilter
base class. The rest of the work was done by @ammeep 🆒Fixes #1025