-
Notifications
You must be signed in to change notification settings - Fork 26
Initializing Public Git Repo Fails/Unclear When Gitbase is Ready #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @elithrar thanks for helping us to polish source{d} Engine. I wonder if that problem could be related with the PGA subset you're using, or even related with your infrastructure. I tried it with this small PGA subset with no problems: # install pga from sources (it will require to have `go` installed)
go get -u github.com/src-d/datasets/PublicGitArchive/...
# downloads all repos from 'src-d' repos writen in 'Go' into '$HOME/repos/pga/siva'
pga list -u /src-d/ -l Go -f json | jq -r ".sivaFilenames[]" | pga get -i -o $HOME/repos/pga
# init source{d} Engine from '$HOME/repos/pga/siva' as downloade by pga in prev step
srcd init $HOME/repos/pga/siva
# query 'remotes' table to get some repo info
srcd sql 'select remote_fetch_url from remotes' Could you maybe share the subset of PGA projects that you're using in order to help us to reproduce the issue? Also, $ srcd sql 'SELECT 1 from refs;' is to check everything is ok, I'd try $ srcd sql 'SELECT count(*) from refs;'
$ srcd sql 'SELECT * from refs limit 1' to ensure there is no problem retrieving that much |
Thanks for the quick reply @dpordomingo!
Will report back later today. |
logs from gitbase can be helpful too:
|
OK - the (small) PGA dump worked. May need to follow Francesc' advice and
put it into a tmpfs on RAM.
Will continue to debug the larger set and see how/why Gitbase is taking
multiple hours to be ready.
…On Mon, Mar 11, 2019 at 8:31 AM Maxim Sukharev ***@***.***> wrote:
logs from gitbase can be helpful too:
docker logs srcd-cli-gitbase
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/src-d/engine/issues/313#issuecomment-471588604>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABIcJURJiWKEtznOQD7j3oQ0M96MHfIks5vVnbqgaJpZM4bn1NE>
.
|
I tried This problem is somehow addressed by src-d/go-mysql-server#631 but it has not been yet added into source{d} Engine. |
OK - thanks David!
Let me see if I can carve it down to 3-4K repos and debug further. I also
only need the tip of each repo, so finding a way to pull those using PGA
would also reduce what Gitbase has to parse.
…On Tue, Mar 12, 2019 at 7:16 AM David Pordomingo ***@***.***> wrote:
I tried pga get and download ~9k repos, then I hit the same problem than
you.
I found that some random repos take too much time to be initialized by
Engine.
This problem is somehow addressed by src-d/go-mysql-server#631
<src-d/go-mysql-server#631> but it has not been
yet added into source{d} Engine.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/src-d/engine/issues/313#issuecomment-472018923>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABIcGUrXxXuG1SDe_gnfA9aOlDqt5vkks5vV7bHgaJpZM4bn1NE>
.
|
I was told this morning that PGA also contains all remote refferences of every repo (it even includes PullRequests references), which could be too much for your use case. |
Alright. Any ideas on how to mutate the PGA manifest / configure it to be
more minimal here?
…On Tue, Mar 12, 2019 at 10:04 AM David Pordomingo ***@***.***> wrote:
I was told this morning that PGA also contains all remote refferences of
every repo (it even includes PullRequests references), which could be too
much for your use case.
I didn't confirm it myself, but maybe you could do it easily inspecting
the content of a repo having a big siva file ~100MB.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/src-d/engine/issues/313#issuecomment-472091346>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABIcI0XUtXlyVcxjJ64HdLbjanyO_cAks5vV94QgaJpZM4bn1NE>
.
|
Update:
~$ docker logs -f srcd-cli-gitbase
But it seems the srcd-cli container can't connect to MySQL:
I don't see MySQL listening in that container:
|
some insights about your logs:
|
Yep - that's what I expected. Querying directly off the tmpfs:
Looking back at the container logs:
|
sorry for the delay, but these investigation issues are kind of out of our scope. I ran it this weekend, and I also found that What I did is to fetch only
You can get the code of that modified version of borges, and instructions about how to run it from this PR dpordomingo/borges#1 |
By the way: initing Engine for that |
Thanks David - I’m going to give the custom dataset approach a shot. Really
appreciate the help.
The ~15 attempts (not an exaggeration) at downloading a working siva
dataset - varying my jq filters to shrink it a little more each time -
seemingly returned a corrupted DB each time.
…On Sun, Mar 31, 2019 at 8:21 AM David Pordomingo ***@***.***> wrote:
By the way: initing Engine for that 7.5K repos was fast (less than a
minute), and the query to count the commits was ready in similar time, so
I'd think that the problem you got was related with a corrupted database
(as seen by repository does not exist error)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/src-d/engine/issues/313#issuecomment-478350924>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABIcIOv9bD5HcdXy0KkWxY05CRE5eUsks5vcNKDgaJpZM4bn1NE>
.
|
Glad to know you're trying hard. |
Summary: Downloading a subset of the PGA and calling
srcd init
fails to provide a usable environment after several hours.srcd Engine version: v0.11.0
Container image versions:
elithrar@matt-workstation:~$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE srcd/cli-daemon v0.11.0 e09406877f03 2 days ago 44MB srcd/gitbase v0.19.0 5df5a9b119a9 3 days ago 37.6MB srcd/gitbase-web v0.6.2 1c745ac11485 4 days ago 109MB bblfsh/bblfshd v2.11.8-drivers ac9a79330aa9 2 weeks ago 1.42GB
Machine spec: 16 vCPU, 200GB RAM, 200GB SSD (custom GCP VM)
Steps to reproduce:
Call
srcd sql 'SHOW tables;
until it no longer reports "waiting for Gitbase to be ready". This takes multiple hours; htop reports random cores pegged at 100% at intervals, but low memory usage.Successfully call
srcd sql 'SHOW TABLES;
and get a table listing:Open to suggestions!
My initial feedback (noting the empathy sessions/issues on this repo) is that determining the status of gitbase is near-impossible as a user without (likely) entering the container; knowing how to initialize the engine on siva files is undocumented (educated guess based on prev. issues; may still be wrong); errors are opaque and lack context (filename, line no. would be useful).
The text was updated successfully, but these errors were encountered: