Skip to content

https://github.com/o2sh/onefetch/issues/310 #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 15, 2020
Merged

https://github.com/o2sh/onefetch/issues/310 #341

merged 9 commits into from
Dec 15, 2020

Conversation

HallerPatrick
Copy link
Contributor

Resolve Issue #310 and #338

Please review, because this includes a lot of changes. Sorry for that.

Copy link
Collaborator

@spenserblack spenserblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Thanks for your hard work!

@o2sh
Copy link
Owner

o2sh commented Dec 11, 2020

Very impressive @HallerPatrick,

I tried running your branch on a few repos, everything seems to work fine except for the linux repo where I got a:

hread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/onefetch/git_utils.rs:40:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

if failed here: let author_name = author.name().unwrap().to_string(); I don't see how a commit can exist without an author name and why it works when using git sys calls..but anyway, I fixed it locally and it worked just fine.

However, the response time has increased quite a bit, still on the linux repo, we went from:

$ time onefetch_master >

real	0m15,742s
user	0m21,599s
sys	0m3,985s

to:

$ time onefetch >

real	0m28,633s
user	0m33,951s
sys	0m1,097s

By the way, the git_utils.rs file was created to isolate functions that were still relying on git sys calls, now everything can be moved into the repo.rs file.

@spenserblack
Copy link
Collaborator

if failed here: let author_name = author.name().unwrap().to_string();

git2::Signature::name returns None if the name isn't valid UTF-8. Perhaps name_bytes should be used instead (and also email ➡️ email_bytes)?

@HallerPatrick
Copy link
Contributor Author

Very impressive @HallerPatrick,

I tried running your branch on a few repos, everything seems to work fine except for the linux repo where I got a:

hread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/onefetch/git_utils.rs:40:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

if failed here: let author_name = author.name().unwrap().to_string(); I don't see how a commit can exist without an author name and why it works when using git sys calls..but anyway, I fixed it locally and it worked just fine.

Alright also fixed the Signature utf-8 problem with using from_utf8_lossy which should return � for invalid utf-8 chars.

However, the response time has increased quite a bit, still on the linux repo, we went from:

$ time onefetch_master >

real	0m15,742s
user	0m21,599s
sys	0m3,985s

to:

$ time onefetch >

real	0m28,633s
user	0m33,951s
sys	0m1,097s

I also tested this now with Hyperfine (which seems to be a good tool) and had the same results. Its interesting that using git sys calls seem to be much faster than using git2. Maybe commands like object-count are more efficiently implemented in the git cli tools, then I did.

I definitely wanted to dig deeper into this.

By the way, the git_utils.rs file was created to isolate functions that were still relying on git sys calls, now everything can be moved into the repo.rs file.

I would probably move this to another pull requests to keep the diff small and separation of purpose. If that's okay.

@o2sh
Copy link
Owner

o2sh commented Dec 13, 2020

Maybe commands like object-count are more efficiently implemented in the git cli tools.

It's the get_git_history function that takes most of the response time:

before (linux repo):

get_git_history --> 11.77949631s
get_files_count --> 18.192162ms
get_repo_size --> 1.709682ms

now:

get_git_history --> 25.771483865s
get_repo_size --> 49.508897ms

By the way, if you remove revwalk.set_sorting(git2::Sort::TIME)?; from get_git_history, its RT drops to:

get_git_history --> 15.868057149s

@HallerPatrick
Copy link
Contributor Author

@o2sh Yeah , I think we reached a little bit of a dead end with git2. With the sys call we just used the logs to parse the commits, but with git2 we can only use the actual refs/commit, which is much more expensive.

IMO, we are dealing with the linux repo with a edge case. With something like tensorflow the performance diff is not really noticeable, which is already a big repo with 100k commits and 2,6 M LOC.

$ time onefetch
3.69s user 1.86s system 212% cpu 2.605 total

$ time ../../Projects/onefetch/target/release/onefetch  
3.84s user 1.34s system 200% cpu 2.586 total

But in the end it is your decision if that is enough or if performance is more important to you. We would have to wait for git2 bindings to logs.

@o2sh
Copy link
Owner

o2sh commented Dec 14, 2020

what about this part:

By the way, if you remove revwalk.set_sorting(git2::Sort::TIME)?; from get_git_history, its RT drops to:

get_git_history --> 15.868057149s

I think it can be removed without issue: hence getting us pretty close to git sys calls RT.

We can start from there and look for other optimizations afterwards - if possible.

@HallerPatrick
Copy link
Contributor Author

what about this part:

By the way, if you remove revwalk.set_sorting(git2::Sort::TIME)?; from get_git_history, its RT drops to:
get_git_history --> 15.868057149s

I think it can be removed without issue: hence getting us pretty close to git sys calls RT.

It looks like the commit list is already sorted, at least are creation date and last commit the same with the removed line. So yeah I removed it

- Merge git_utils with repo into a single file
- Move get_repo_size_field into info file
- Remove blocking dependency with Git binary from onefetch
@o2sh
Copy link
Owner

o2sh commented Dec 14, 2020

I took the liberty to make some minor changes e99c56f

@HallerPatrick
Copy link
Contributor Author

I took the liberty to make some minor changes e99c56f

The cleaner the code the happier I am.

Is anything now holding us back from merging?

@o2sh
Copy link
Owner

o2sh commented Dec 15, 2020

We'll wait for @spenserblack to finish his review.

On my side, I'm very happy with this PR.

Indeed, even if we loose in performance, I still believe that getting rid of the git sys calls - we still have one call in get_git_version - is the way to go as it makes for a more homogeneous code base and allows onefetch to be used even if the user doesn't have the Git binary installed on his machine.

@HallerPatrick
Copy link
Contributor Author

We'll wait for @spenserblack to finish his review.

On my side, I'm very happy with this PR.

Indeed, even if we loose in performance, I still believe that getting rid of the git sys calls - we still have one call in get_git_version - is the way to go as it makes for a more homogeneous code base and allows onefetch to be used even if the user doesn't have the Git binary installed on his machine.

I am on your side.

get_git_versaion/ is_git_installedare only there to get infos about the git binary, which cold never be replaced by git2.

Copy link
Collaborator

@spenserblack spenserblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll wait for @spenserblack to finish his review.

Sorry, I didn't realize that you were still waiting for me.

This looks great, @HallerPatrick, thanks for all your help!
Once again, I left a suggested change that shouldn't change the logic at all, just the code style. Optional if you want to implement it.

Overall, LGTM 👍

Comment on lines 26 to 36
if let Ok(commit) = repo.find_commit(r) {
if no_merges {
let parents = commit.parents().len();
if parents > 1 {
return None;
}
}
Some(commit)
} else {
None
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is basically converting a Result to an Option that has been filtered, I think this can be simplified with Result::ok and Option::filter.
Something like

repo.find_commit(r).ok().filter(|commit| !(no_merges && commit.parents().len() > 1))

Once again, didn't test the above code, though 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright worked directly.

@o2sh o2sh merged commit 80424f4 into o2sh:master Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants