Skip to content

Commit e166e55

Browse files
committed
docs: update documentation for CLI
1 parent f558210 commit e166e55

File tree

1 file changed

+73
-79
lines changed

1 file changed

+73
-79
lines changed

Diff for: docs/cli.md

+73-79
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## adaparse
1+
## Command line interface (CLI)
22

33
The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, normalizes and queries them efficiently.
44

@@ -13,56 +13,93 @@ The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, norm
1313
- `-p`, `--path`: Process all the URLs in a given file
1414
- `-o`, `--output`: Output the results of the parsing to a file
1515

16-
### Usage/Examples:
16+
### Performance
1717

18-
Well-formatted URL:
18+
Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of
19+
sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes.
20+
The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
21+
compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0).
1922

20-
```bash
21-
adaparse "http://www.google.com"
23+
<details>
24+
<summary>With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about **three times faster than trurl**.</summary>
2225
```
23-
Output:
26+
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
27+
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
28+
trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
29+
2430
31+
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
32+
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
33+
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
2534
```
26-
http://www.google.com
35+
</details>
36+
37+
<details>
38+
<summary>With the top100 dataset, the adaparse tool is **twice as fast as the trurl**.</summary>
39+
```
40+
time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
41+
cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
42+
trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
43+
44+
time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
45+
cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
46+
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
2747
```
48+
</details>
49+
50+
51+
#### Comparison
2852

29-
Ill-formatted URL:
53+
```
54+
wikipedia 100k
55+
ada ▏ 55 ms ███████▋
56+
trurl ▏ 180 ms █████████████████████████
57+
58+
top100
59+
ada ▏ 61 ms █████████████▍
60+
trurl ▏ 113 ms █████████████████████████
61+
```
62+
63+
The results will vary depending on your system. We invite you to run your own benchmarks.
64+
65+
### Usage/Examples
66+
67+
#### Well-formatted URL
3068

3169
```bash
32-
adaparse "h^tp:ws:/www.g00g.com"
70+
adaparse "http://www.google.com"
3371
```
3472
Output:
3573

3674
```
37-
Invalid URL: h^tp:ws:/www.g00g.com
75+
http://www.google.com
3876
```
3977

40-
41-
Diagram flag:
78+
#### Diagram
4279

4380
```bash
4481
adaparse -d http://www.google.com/bal\?a\=\=11\#fddfds
45-
```
82+
```
4683

4784
Output:
4885

49-
```
50-
http://www.google.com/bal?a==11#fddfds [38 bytes]
51-
| | | | |
52-
| | | | `------ hash_start
53-
| | | `------------ search_start 25
54-
| | `---------------- pathname_start 21
55-
| | `---------------- host_end 21
56-
| `------------------------------ host_start 7
57-
| `------------------------------ username_end 7
58-
`-------------------------------- protocol_end 5
86+
```
87+
http://www.google.com/bal?a==11#fddfds [38 bytes]
88+
| | | | |
89+
| | | | `------ hash_start
90+
| | | `------------ search_start 25
91+
| | `---------------- pathname_start 21
92+
| | `---------------- host_end 21
93+
| `------------------------------ host_start 7
94+
| `------------------------------ username_end 7
95+
`-------------------------------- protocol_end 5
5996
```
6097

98+
#### Pipe Operator
6199

62-
63-
### Piping Example
64-
65-
Ada can process URLs from piped input, making it easy to integrate with other command-line tools that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada. Given a list of URLs, one by line, we may query the normalized URL string (`href`) and detect any malformed URL:
100+
Ada can process URLs from piped input, making it easy to integrate with other command-line tools
101+
that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada.
102+
Given a list of URLs, one by line, we may query the normalized URL string (`href`) and detect any malformed URL:
66103

67104
```bash
68105
cat dragonball_url.txt | adaparse --get href
@@ -95,14 +132,16 @@ www.gohan.com
95132
If you omit `-g`, it will only provide a list of invalid URLs. This might be
96133
useful if you want to valid quickly a list of URLs.
97134

135+
### Benchmark Runner
98136

99137
The benchmark flag can be used to output the time it takes to process piped input:
100138

101139
```bash
102140
cat wikipedia_100k.txt | adaparse -b
103141
```
104142

105-
```bash
143+
Output:
144+
```
106145
Invalid URL: 1968:_Die_Kinder_der_Diktatur
107146
Invalid URL: 58957:_The_Bluegrass_Guitar_Collection
108147
Invalid URL: 650luc:_Gangsta_Grillz
@@ -120,26 +159,29 @@ read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads
120159
0.1587226744053009 GB/s
121160
```
122161

162+
#### Saving result to file system
163+
123164
There is an option to output to a file on disk:
124165

125166
```bash
126-
127167
cat wikipedia_100k.txt | adaparse -o wiki_output.txt
128168
```
129169

130-
as well as read in from a file on disk without going through cat:
170+
As well as read in from a file on disk without going through cat:
131171

132172
```bash
133173
adaparse -p wikipedia_top_100_txt
134174
```
135175

176+
#### Advanced Usage
177+
136178
You may also combine different flags together. E.g. Say one wishes to extract only the host from URLs stored in wikipedia.txt and output it to the test_write.txt file:
137179

138180
```bash
139181
adaparse" -p wikipedia_top100.txt -o test_write.txt -g host -b
140182
```
141183
142-
Console output:
184+
Output:
143185
```bash
144186
read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads
145187
0.19483260937757307 GB/s(base)
@@ -160,51 +202,3 @@ en.wikipedia.org
160202
en.wikipedia.org
161203
(---snip---)
162204
```
163-
164-
### Performance
165-
166-
Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of
167-
sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes.
168-
The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
169-
compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0).
170-
171-
<details><summary>
172-
With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about three
173-
times faster than trurl.</summary>
174-
<pre>
175-
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
176-
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
177-
trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
178-
179-
180-
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
181-
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
182-
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
183-
</pre>
184-
</details>
185-
186-
<details><summary>With the top100 dataset, the adaparse tool is twice as fast as the trurl.</summary>
187-
<pre>
188-
time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
189-
cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
190-
trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
191-
192-
time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
193-
cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
194-
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
195-
</pre>
196-
</details>
197-
198-
199-
200-
The results will vary depending on your system. We invite you to run your own benchmarks.
201-
202-
```
203-
wikipedia 100k
204-
ada ▏ 55 ms ███████▋
205-
trurl ▏ 180 ms █████████████████████████
206-
207-
top100
208-
ada ▏ 61 ms █████████████▍
209-
trurl ▏ 113 ms █████████████████████████
210-
```

0 commit comments

Comments
 (0)