Skip to content

Streaming Filter Support #1030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 10, 2015
Merged

Streaming Filter Support #1030

merged 4 commits into from
Jun 10, 2015

Conversation

whoisj
Copy link

@whoisj whoisj commented Apr 23, 2015

Completes the work started by @ammeep for getting filters working in Libgit2Sharp (thanks for the awesome starting point!). Makes use of the streaming filter infrastructure added by @ethomson in Libgit2 to minimize the memory footprint. The goal is to support the Git-LFS filter, so one has to assume the streams won't fit in memory.

Fairly simple design. Added a new GitWriteStream to wrap the native git_writestream, then implements all logic for doing the actual stream work in the Filter base class. The rest of the work was done by @ammeep 🆒

Fixes #1025

[MarshalAs(UnmanagedType.FunctionPtr)]
public free_fn free;

public delegate int write_fn(IntPtr stream, IntPtr buffer, uint len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather marshal size_t into a UIntPtr

@nulltoken
Copy link
Member

Wow! That's a pretty nice interop job. Thanks for giving some love to those filters.

/cc @shiftkey @ammeep @ethomson @jamill @Therzok Could you please also take a peek at this PR?

@ethomson
Copy link
Member

I think this looks great. Thanks @whoisj

{
Ensure.ArgumentNotNullOrEmptyString(filterName, "filterName");

if (!filterName.Contains(AttributeFilterDefinition))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Contains(...) the right check here? I was kinda expecting a StartsWith(...) with an explicit string comparer...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll change this to be .StartsWith(..., Ordinal)

@shiftkey
Copy link
Contributor

👀 will test this out against GHfW

@whoisj
Copy link
Author

whoisj commented Apr 24, 2015

@jamill Yes, agreed - literals (aka magic numbers) should be moved into constants. My bad, tired brains take shortcuts. I'll fix this up ASAP. As for the exception, I'm using it as flow control. I suppose I should use something cheaper like a continue or goto and keep the exception for exceptional circumstances.

In general, is it better in a native-managed-native call stack to just avoid exceptions all together? Every project seems to have its own set of rules about this. I don't see anything in the .md files about this. What is expected in Libgit2Sharp?

/cc @nulltoken , @ethomson , @shiftkey

@whoisj
Copy link
Author

whoisj commented Apr 24, 2015

Thanks for all of the great feedback. Fixes pushed for all with the exception of the discussion @ethomson and @jamill are having with regards to requiring "filter=". I'm not sure if that should be part of this PR or not.

I agree with the explicit requirement, not sure what kinds of errors it should produce, etc.

@jamill
Copy link
Member

jamill commented Apr 24, 2015

@ethomson and @jamill having with regards to requiring "filter=". I'm not sure if that should be part of this PR or not.

I think @ethomson and I are in agreement on always adding the filter= prefix to the filter name, and expect that clients will never specify the filter= prefix. I would prefer to have this behavior in the original version, if it is not too much work (unless @ammeep or others have objections), but not too strongly. I could see this going in in a separate PR.

@ammeep
Copy link
Member

ammeep commented Apr 25, 2015

@whoisj this is wonderful! Thank you so much for taking this on, I have been swamped. 😍

We ended up making a fork of libgit2sharp without the api changes in libgit2, which GitHub for Windows is currently using. https://github.com/github/libgit2sharp/tree/ghfw-p-flow-plus-filters

It includes one change that didn't make it into the original libgit2sharp PR github@dbea75e which allows us to pass the working directory of the repository to the clean and smudge overloads on the filter. This turns out to be important because most filter CLI tools (or at least the Git LFS CLI) require the working directory of the repository.

Do you have any objections to adding this to the API contract?

@whoisj
Copy link
Author

whoisj commented Apr 25, 2015

@ammeep I think that makes sense, no objections here.

@whoisj
Copy link
Author

whoisj commented Apr 27, 2015

Updated the PR with changes requested by @ammeep as well as added a necessary, missing callback on filter close to allow the filter to do any final, necessary work before being closed.

@shiftkey
Copy link
Contributor

Just ran this through our existing test suite in GitHub for Windows - I'm 👍 on this PR. Thanks for the hard work @whoisj!

@nulltoken
Copy link
Member

Dammit! The PR isn't mergeable any longer. Could you please rebase it?

ammeep and others added 4 commits June 10, 2015 12:20
Added complete callback to facilitate long running or transforming filters like Git-LFS.

Added repository's working directory to callbacks to support extern process like Git-LFS, where knowledge of a repository's root maybe necissary but unobtainable in another fasion.

Removed unused NativeMethods
The filters are run in-line with the rest of the system, and libgit2 is
set up to have its last filter/writestream write to the
filesystem. Instead of using an intermediate buffer/file/pipe, we can
make the writes from the user's Smude() or Clean() filter into their output
stream propagate immediately into the next filter, leading up the chain
to the filesystem, removing any intermediate buffering other than that
which is necessary for the filters themselves to work.

In the general case, we only have the one clean or smudge filter, which
means that even in the case of out-of-band storage for large files,
where a small blob can cause very large outputs, we would still write
directly to the filesystem.

by J Wyman (@whoisj): Buffered writes to next filter to minimize managed to native marshaling for performance optimizations

by J Wyman (@whoisj): Split `WriteStream` into its own file and cleaned it up (checks, style, etc)
@whoisj
Copy link
Author

whoisj commented Jun 10, 2015

@nulltoken oh that was a fun merge (in the text sense) 😑

nulltoken added a commit that referenced this pull request Jun 10, 2015
@nulltoken nulltoken merged commit 431e9c6 into libgit2:vNext Jun 10, 2015
@nulltoken
Copy link
Member

@whoisj @ammeep

love

@nulltoken nulltoken added this to the v0.22 milestone Jun 10, 2015
@whoisj
Copy link
Author

whoisj commented Jun 10, 2015

🎉 🎈 🍻 😌

@whoisj whoisj deleted the streaming-filter-support branch June 10, 2015 19:58
@nulltoken nulltoken mentioned this pull request Jun 10, 2015
5 tasks
@dahlbyk
Copy link
Member

dahlbyk commented Jun 10, 2015

Epic.

@ammeep
Copy link
Member

ammeep commented Jun 10, 2015

🎉 Nice work @whoisj!!

@shiftkey
Copy link
Contributor

Nice work @ammeep and @whoisj!

@whoisj
Copy link
Author

whoisj commented Jun 10, 2015

Thanks to @ethomson for the base work in Libgit2, @ammeep for the base work in this library, and to @carlosmn for being awesome and putting my head on straight there at the end. 🙇

@nulltoken
Copy link
Member

@whoisj This AppVeyor job shows some strange failures related this PR. Any idea?

LibGit2Sharp.Tests.FilterFixture.CanFilterLargeFiles [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

LibGit2Sharp.Tests.FilterFixture.WhenStagingFileApplyIsCalledWithCleanForCorrectPath [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

LibGit2Sharp.Tests.FilterFixture.CleanFilterWritesOutputToObjectTree [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

LibGit2Sharp.Tests.FilterFixture.WhenCheckingOutAFileFileSmudgeWritesCorrectFileToWorkingDirectory [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

@nulltoken
Copy link
Member

Some more detail in this other one

LibGit2Sharp.Tests.FilterFixture.CanFilterLargeFiles [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

LibGit2Sharp.Tests.FilterFixture.CleanFilterWritesOutputToObjectTree [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

LibGit2Sharp.Tests.FilterFixture.WhenCheckingOutAFileFileSmudgeWritesCorrectFileToWorkingDirectory [FAIL]
   LibGit2Sharp.LibGit2SharpException : Unexpected IntPtr value
   Parameter name: stream
   Stack Trace:
      c:\projects\libgit2sharp\LibGit2Sharp\Core\Ensure.cs(154,0): at LibGit2Sharp.Core.Ensure.HandleError(Int32 result)
      c:\projects\libgit2sharp\LibGit2Sharp\Repository.cs(1722,0): at LibGit2Sharp.Repository.Stage(IEnumerable`1 paths, StageOptions stageOptions)
      c:\projects\libgit2sharp\LibGit2Sharp.Tests\FilterFixture.cs(319,0): at LibGit2Sharp.Tests.FilterFixture.StageNewFile(IRepository repo, String contents)
      c:\projects\libgit2sharp\LibGit2Sharp.Tests\FilterFixture.cs(293,0): at LibGit2Sharp.Tests.FilterFixture.CheckoutFileForSmudge(String repoPath, String branchName, String content)
      c:\projects\libgit2sharp\LibGit2Sharp.Tests\FilterFixture.cs(170,0): at LibGit2Sharp.Tests.FilterFixture.WhenCheckingOutAFileFileSmudgeWritesCorrectFileToWorkingDirectory()

LibGit2Sharp.Tests.FilterFixture.WhenStagingFileApplyIsCalledWithCleanForCorrectPath [FAIL]
   System.NullReferenceException : Object reference not set to an instance of an object.

@whoisj
Copy link
Author

whoisj commented Jun 11, 2015

@nulltoken it's very possible something went wrong in the series of rebases the PR took just before merge. I'm not able to reproduce these failures locally. Do they only happen in AppVeyor or can you reproduce them locally?

@nulltoken
Copy link
Member

I'm not able to reproduce these failures locally.

I'm not either.

Weirder, see https://ci.appveyor.com/project/libgit2/libgit2sharp/history

Just restarted #859 build to see if this is consistent (although I'm not sure to understand why). Maybe AppVeyor prefers @ethomson's code over @jamill's code.

@nulltoken
Copy link
Member

Very strange. #859 is now passing.

This kind of random issues makes me feel very uneasy especially when the tests do not rely on external dependencies (eg. network).

@whoisj
Copy link
Author

whoisj commented Jun 11, 2015

@nulltoken I see trancient test failures commonly in the BlobFixture, the FilterFixture, the ArchiveFixture, and the CheckoutFixture several times a day and a re-run of the tests nearly always resolves the issue. The errors are always the same: either NullReferenceException or DirectoryNotFound exceptions - at least for me.

This has been happening since before I started on the project and since they were always transient and always resolved by a re-run I ignored them.

@nulltoken
Copy link
Member

@whoisj Wow. That is very weird. I've never seen BlobFixture/ArchiveFixture/CheckoutFixture failing. Neither locally, nor in the CI logs.

With regards to your failures, are you still running the tests against a RAM disk? Could we be experiencing IO concurrency issues?

@whoisj
Copy link
Author

whoisj commented Jun 11, 2015

@nulltoken I am using the RAM disk locally, but the transient failures have been happening since before I switched from SSD to RAM. I do not think they're related. One piece of information I do have is it feels like it's more common to see transient failures immediately after building. In fact, I do not believe I've ever seen them when using an existing DLL. JIT issues?

@nulltoken
Copy link
Member

Regarding #1030 (comment) @jeffhostetler was hinting at a possible GC related reclaim.

@dahlbyk
Copy link
Member

dahlbyk commented Jun 18, 2015

@carlosmn
Copy link
Member

One of those errors are from writing to the next stream, which sets its own error message. It should provide more information about the underlying cause if WriteStream.Write() were to use Ensure.ZeroResult() as it could simply be that we're running out of disk space, but in any case it should tell us why libgit2 returned an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants