Skip to content

Add benchmarks #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anonrig opened this issue Mar 9, 2024 · 4 comments · Fixed by #61
Closed

Add benchmarks #59

anonrig opened this issue Mar 9, 2024 · 4 comments · Fixed by #61

Comments

@anonrig
Copy link
Member

anonrig commented Mar 9, 2024

How does ada-python perform compared to urllib?

@lemire
Copy link
Member

lemire commented Mar 9, 2024

Last time I ran simplistic benchmarks, we were 50% slower than urllib.
#1 (comment)

@TkTech had a faster alternative which we did not take.

It is possible that the current version fares better.

It is likely that most of the running time will be spent on the binding and not in ada per se.

@TkTech
Copy link

TkTech commented Mar 9, 2024

I can update my version if you'd find it useful for comparison.

@bbayles
Copy link
Collaborator

bbayles commented Mar 10, 2024

I updated the script from the earlier comment.

Benchmark
import can_ada
from urllib.parse import urlparse
from ada_url import URL
from time import perf_counter
import os

print("can_ada")

start_time = perf_counter()
total = 0
with open('/tmp/top100.txt', 'rt') as f:
    for line in f:
        try:
            can_ada.parse(line)
        except Exception:
            pass
        else:
            total +=1

end_time = perf_counter()
print(total, end_time - start_time, sep='\t')

print("urllib")

start_time = perf_counter()
total = 0
with open('/tmp/top100.txt', 'rt') as f:
    for line in f:
        try:
            urlparse(line)
        except Exception:
            pass
        else:
            total +=1

end_time = perf_counter()
print(total, end_time - start_time, sep='\t')

print("ada_url")

start_time = perf_counter()
total = 0
with open('/tmp/top100.txt', 'rt') as f:
    for line in f:
        try:
           urlobj = URL(line)
        except ValueError:
            pass
        else:
            total +=1

end_time = perf_counter()
print(total, end_time - start_time, sep='\t')

Here are the results I got:

can_ada
99999	0.1501040810253471

urllib
100031	0.5046708540758118

ada_url
99999	0.2472913929959759

Not slower than urllib after the changes I made re: the last thread.

@TkTech
Copy link

TkTech commented Mar 10, 2024

A new v1.1.1 version of can_ada is live with the latest upstream Ada, py312 binaries, and idna_encode/idna_decode for parity with ada_url.

No change to the general performance, which makes sense since the benchmark in this thread is mostly just testing function call overhead when comparing the two.

If your goal is performance at all costs, a quick minimal C-only wrapper was even faster, but comes at the cost of drastically more work in the bindings then the ~60 lines they are currently. Wouldn't recommend it for dev sanity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants