1
- ## adaparse
1
+ ## Command line interface (CLI)
2
2
3
3
The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, normalizes and queries them efficiently.
4
4
@@ -13,56 +13,93 @@ The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, norm
13
13
- ` -p ` , ` --path ` : Process all the URLs in a given file
14
14
- ` -o ` , ` --output ` : Output the results of the parsing to a file
15
15
16
- ### Usage/Examples:
16
+ ### Performance
17
17
18
- Well-formatted URL:
18
+ Our ` adaparse ` tool may outperform other popular alternatives. We offer a [ collection of
19
+ sets of URLs] ( https://github.com/ada-url/url-various-datasets ) for benchmarking purposes.
20
+ The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
21
+ compare against [ trurl] ( https://github.com/curl/trurl ) version 0.6 (libcurl/7.87.0).
19
22
20
- ``` bash
21
- adaparse " http://www.google.com "
23
+ < details >
24
+ < summary >With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about **three times faster than trurl**.</ summary >
22
25
```
23
- Output:
26
+ time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
27
+ cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
28
+ trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
29
+
24
30
31
+ time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
32
+ cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
33
+ ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
25
34
```
26
- http://www.google.com
35
+ </details >
36
+
37
+ <details >
38
+ <summary >With the top100 dataset, the adaparse tool is **twice as fast as the trurl**.</summary >
39
+ ```
40
+ time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
41
+ cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
42
+ trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
43
+
44
+ time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
45
+ cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
46
+ ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
27
47
```
48
+ </details >
49
+
50
+
51
+ #### Comparison
28
52
29
- Ill-formatted URL:
53
+ ```
54
+ wikipedia 100k
55
+ ada ▏ 55 ms ███████▋
56
+ trurl ▏ 180 ms █████████████████████████
57
+
58
+ top100
59
+ ada ▏ 61 ms █████████████▍
60
+ trurl ▏ 113 ms █████████████████████████
61
+ ```
62
+
63
+ The results will vary depending on your system. We invite you to run your own benchmarks.
64
+
65
+ ### Usage/Examples
66
+
67
+ #### Well-formatted URL
30
68
31
69
``` bash
32
- adaparse " h^tp:ws:/ www.g00g .com"
70
+ adaparse " http:// www.google .com"
33
71
```
34
72
Output:
35
73
36
74
```
37
- Invalid URL: h^tp:ws:/ www.g00g .com
75
+ http:// www.google .com
38
76
```
39
77
40
-
41
- Diagram flag:
78
+ #### Diagram
42
79
43
80
``` bash
44
81
adaparse -d http://www.google.com/bal\? a\=\= 11\# fddfds
45
- ```
82
+ ```
46
83
47
84
Output:
48
85
49
- ```
50
- http://www.google.com/bal?a==11#fddfds [38 bytes]
51
- | | | | |
52
- | | | | `------ hash_start
53
- | | | `------------ search_start 25
54
- | | `---------------- pathname_start 21
55
- | | `---------------- host_end 21
56
- | `------------------------------ host_start 7
57
- | `------------------------------ username_end 7
58
- `-------------------------------- protocol_end 5
86
+ ```
87
+ http://www.google.com/bal?a==11#fddfds [38 bytes]
88
+ | | | | |
89
+ | | | | `------ hash_start
90
+ | | | `------------ search_start 25
91
+ | | `---------------- pathname_start 21
92
+ | | `---------------- host_end 21
93
+ | `------------------------------ host_start 7
94
+ | `------------------------------ username_end 7
95
+ `-------------------------------- protocol_end 5
59
96
```
60
97
98
+ #### Pipe Operator
61
99
62
-
63
- ### Piping Example
64
-
65
- Ada can process URLs from piped input, making it easy to integrate with other command-line tools that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada. Given a list of URLs, one by line, we may query the normalized URL string (` href ` ) and detect any malformed URL:
100
+ Ada can process URLs from piped input, making it easy to integrate with other command-line tools
101
+ that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada.
102
+ Given a list of URLs, one by line, we may query the normalized URL string (` href ` ) and detect any malformed URL:
66
103
67
104
``` bash
68
105
cat dragonball_url.txt | adaparse --get href
@@ -95,14 +132,16 @@ www.gohan.com
95
132
If you omit ` -g ` , it will only provide a list of invalid URLs. This might be
96
133
useful if you want to valid quickly a list of URLs.
97
134
135
+ ### Benchmark Runner
98
136
99
137
The benchmark flag can be used to output the time it takes to process piped input:
100
138
101
139
``` bash
102
140
cat wikipedia_100k.txt | adaparse -b
103
141
```
104
142
105
- ``` bash
143
+ Output:
144
+ ```
106
145
Invalid URL: 1968:_Die_Kinder_der_Diktatur
107
146
Invalid URL: 58957:_The_Bluegrass_Guitar_Collection
108
147
Invalid URL: 650luc:_Gangsta_Grillz
@@ -120,26 +159,29 @@ read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads
120
159
0.1587226744053009 GB/s
121
160
```
122
161
162
+ #### Saving result to file system
163
+
123
164
There is an option to output to a file on disk:
124
165
125
166
``` bash
126
-
127
167
cat wikipedia_100k.txt | adaparse -o wiki_output.txt
128
168
```
129
169
130
- as well as read in from a file on disk without going through cat:
170
+ As well as read in from a file on disk without going through cat:
131
171
132
172
``` bash
133
173
adaparse -p wikipedia_top_100_txt
134
174
```
135
175
176
+ #### Advanced Usage
177
+
136
178
You may also combine different flags together. E.g. Say one wishes to extract only the host from URLs stored in wikipedia.txt and output it to the test_write.txt file:
137
179
138
180
``` bash
139
181
adaparse" -p wikipedia_top100.txt -o test_write.txt -g host -b
140
182
` ` `
141
183
142
- Console output :
184
+ Output :
143
185
` ` ` bash
144
186
read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads
145
187
0.19483260937757307 GB/s(base)
@@ -160,51 +202,3 @@ en.wikipedia.org
160
202
en.wikipedia.org
161
203
(---snip---)
162
204
` ` `
163
-
164
- ### Performance
165
-
166
- Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of
167
- sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes.
168
- The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
169
- compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0).
170
-
171
- <details><summary>
172
- With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about three
173
- times faster than trurl.</summary>
174
- <pre>
175
- time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
176
- cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
177
- trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
178
-
179
-
180
- time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
181
- cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
182
- ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
183
- </pre>
184
- </details>
185
-
186
- <details><summary>With the top100 dataset, the adaparse tool is twice as fast as the trurl.</summary>
187
- <pre>
188
- time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
189
- cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
190
- trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
191
-
192
- time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
193
- cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
194
- ./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
195
- </pre>
196
- </details>
197
-
198
-
199
-
200
- The results will vary depending on your system. We invite you to run your own benchmarks.
201
-
202
- ```
203
- wikipedia 100k
204
- ada ▏ 55 ms ███████▋
205
- trurl ▏ 180 ms █████████████████████████
206
-
207
- top100
208
- ada ▏ 61 ms █████████████▍
209
- trurl ▏ 113 ms █████████████████████████
210
- ```
0 commit comments