Skip to content

Commit c135ab0

Browse files
committed
1m commits
1 parent 47ee254 commit c135ab0

File tree

6 files changed

+74
-3
lines changed

6 files changed

+74
-3
lines changed

generate-commits-fast.md

+2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
It is sometimes interesting to generate a ton of commits to test some edge case, but it is not trivial to go way above 1000 commits in a reasonable amount of time.
44

5+
Bottom line: don't use `git`. The manual Python code under [other-test-repos](other-test-repos/) presents a huge speedup. TODO: try gitlib2.
6+
57
1000 operations take on a my computer:
68

79
- echo to file, add and commit: 43s

other-test-repos/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
*.pyc
2+
/tmp/
23
/*.tmp/
34
__pycache__/

other-test-repos/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Tests that are very large will not be included here to keep this repository smal
77
- <https://github.com/cirosantilli/test-deep>
88
- <https://github.com/cirosantilli/test-diff-many-files>
99
- <https://github.com/cirosantilli/test-pr-many-commits>
10+
- <https://github.com/cirosantilli/test-many-commits-1m> [many-commits.py](many-commits.py)
1011

1112
There are also some tests that could not be included here conveniently:
1213

@@ -27,6 +28,7 @@ There are also some tests that could not be included here conveniently:
2728
- <https://github.com/cirosantilli/test-symlink-self>
2829
- <https://github.com/cirosantilli/test-symlink-start-null>
2930
- <https://gitlab.com/cirosantilli/test-GIT/tree/master> (fails on GitHub)
31+
- <https://gitlab.com/cirosantilli/test-commit-many-parents-1m>
3032

3133
Other similar repos from other people:
3234

other-test-repos/duplicate-parent-lowlevel.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,12 @@
33
"""
44
Git does not let a commit have twice the same parent, but GitHub does, and normally shows it.
55
But as of 2016-05-17 they didn't page this edge case, and it 502's the commit for large numbers of links.
6+
7+
If you increase the value a lot, when you clone and cd into the repo. your computer may bog down
8+
if you show a git status on the bash, because Git memory explodes trying to parse that.
9+
Actual compressed size is very small though, since gzip compresses all that repeated data very efficiently.
10+
11+
push would require an obscene ammount of memory (malloc fails on `ulimit -Sv`), so I couldn't test it.
612
"""
713

814
import itertools
@@ -13,7 +19,7 @@
1319

1420
tree = util.create_tree_with_one_file()
1521
commit, _, _ = util.save_commit_object(tree, author_name=b'a')
16-
commit, _, _ = util.save_commit_object(tree, itertools.repeat(commit, 1000000), author_name=b'b')
22+
commit, _, _ = util.save_commit_object(tree, itertools.repeat(commit, 10000000), author_name=b'b')
1723

1824
# Finish.
1925
util.create_master(commit)

other-test-repos/many-commits.py

+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
#!/usr/bin/env python3
2+
3+
"""
4+
You may want to create this in a tmpfs or ramfs, since deleting the generate repository can take a **huge** ammount of time.
5+
6+
ulimit -Sv 500000
7+
sudo umount tmp && \
8+
sudo mount -t tmpfs -o size=1g tmpfs tmp && \
9+
sudo chown $USER:$USER tmp &&
10+
./imagine-all-the-people.py
11+
12+
The tags can be used to push by parts to GitHub, which does not accept 1M at once:
13+
14+
remote='[email protected]:cirosantilli/test-many-commits-1m.git'
15+
for i in `seq 10 10 100`; do
16+
git --git-dir=tmp/repo.tmp/.git push -f "$remote" "$i:master"
17+
done
18+
# TODO for some reason I needed this afterwards.
19+
git --git-dir=tmp/repo.tmp/.git push "$remote" 'master'
20+
"""
21+
22+
import datetime
23+
import subprocess
24+
25+
import util
26+
27+
util.init()
28+
29+
tree = util.create_tree_with_one_file()
30+
commit = None
31+
n = 1000000
32+
percent = (n / 100)
33+
p = 0
34+
for i in range(n):
35+
commit, _, _ = util.save_commit_object(tree, (commit,),
36+
message=(str(i).encode('ascii')))
37+
if i % percent == 0:
38+
print(p)
39+
print(datetime.datetime.now())
40+
p += 1
41+
42+
# Lose objects are too large and blow up the tmpfs.
43+
44+
# Does clean packets, but the calculation takes more and more memory,
45+
# and slows down and blows up at the end. TODO which subcommand blows up eactly?.
46+
#subprocess.check_output(['git', 'gc'])
47+
48+
subprocess.check_output(['git', 'repack'])
49+
subprocess.check_output(['git', 'prune-packed'])
50+
51+
subprocess.check_output(['git', 'tag', str(p), commit])
52+
53+
# Finish.
54+
util.create_master(commit)
55+
util.clone()

other-test-repos/util.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
"""
2+
TODO packfile operations instead of just object. Could be more efficient.
3+
But also harder to implement that format.
4+
"""
5+
16
import hashlib
27
import os
38
import shutil
@@ -30,7 +35,7 @@
3035
default_parents = ()
3136

3237
def init():
33-
repo = 'repo.tmp'
38+
repo = 'tmp/repo.tmp'
3439
for d in (repo, 'clone.tmp'):
3540
shutil.rmtree(d, ignore_errors=True)
3641
os.mkdir(repo)
@@ -66,7 +71,7 @@ def save_commit_object(
6671
committer_email=default_committer_email,
6772
committer_date=default_committer_date,
6873
message=default_message):
69-
if parents:
74+
if parents and parents[0]:
7075
parents_bytes = b''
7176
sep = b'\nparent '
7277
parents_bytes = sep + sep.join(parents) + b'\n'

0 commit comments

Comments
 (0)