Skip to content

Commit a41c5f6

Browse files
committed
1 parent fa8d5e6 commit a41c5f6

31 files changed

+1597
-1601
lines changed

.rubocop.yml

+22-8
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,29 @@
1+
inherit_mode:
2+
merge:
3+
- Exclude
4+
5+
require:
6+
- standard
7+
- standard-custom
8+
- standard-performance
9+
- rubocop-performance
10+
- rubocop-minitest
11+
- rubocop-packaging
12+
- rubocop-rake
13+
14+
inherit_gem:
15+
standard: config/base.yml
16+
standard-custom: config/base.yml
17+
standard-performance: config/base.yml
18+
119
inherit_from: .rubocop_todo.yml
220

321
AllCops:
422
Exclude:
5-
- '.*/**/*'
6-
- 'benchmark/**/*'
7-
- 'test/**/*'
8-
- 'tmp/**/*'
9-
TargetRubyVersion: 3.1.0
10-
11-
Metrics:
12-
Enabled: false
23+
- ".*/**/*"
24+
- "tmp/**/*"
25+
SuggestExtensions: false
26+
TargetRubyVersion: 3.1
1327

1428
Style/Documentation:
1529
Enabled: false

.standard.yml

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ruby_version: 3.1

Gemfile

+4
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,8 @@ group :development do
77
gem "bundler", "~> 2.6.2"
88
gem "minitest", "5.25.4"
99
gem "rake", "13.2.1"
10+
gem "rubocop-minitest", "0.36.0"
11+
gem "rubocop-packaging", "0.5.2"
12+
gem "rubocop-rake", "0.6.0"
13+
gem "standard", "1.43.0"
1014
end

README.md

+59-69
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,14 @@ If you don't specify any configuration options, Sanitize will use its strictest
7171
```ruby
7272
html = '<b><a href="http://foo.com/">foo</a></b><img src="bar.jpg">'
7373
Sanitize.fragment(html)
74-
# => 'foo'
74+
# => "foo"
7575
```
7676

7777
To keep certain elements, add them to the element allowlist.
7878

7979
```ruby
80-
Sanitize.fragment(html, :elements => ['b'])
81-
# => '<b>foo</b>'
80+
Sanitize.fragment(html, elements: ['b'])
81+
# => "<b>foo</b>"
8282
```
8383

8484
### HTML Documents
@@ -94,14 +94,10 @@ html = %[
9494
]
9595

9696
Sanitize.document(html,
97-
:allow_doctype => true,
98-
:elements => ['html']
97+
allow_doctype: true,
98+
elements: ['html']
9999
)
100-
# => %[
101-
# <!DOCTYPE html><html>foo
102-
#
103-
# </html>
104-
# ]
100+
# => "<!DOCTYPE html><html>foo\n \n</html>"
105101
```
106102

107103
### CSS in HTML
@@ -119,11 +115,11 @@ html = %[
119115
]
120116

121117
Sanitize.fragment(html,
122-
:elements => ['div', 'style'],
123-
:attributes => {'div' => ['style']},
118+
elements: ['div', 'style'],
119+
attributes: {'div' => ['style']},
124120

125-
:css => {
126-
:properties => ['width']
121+
css: {
122+
properties: ['width']
127123
}
128124
)
129125
#=> %[
@@ -156,7 +152,6 @@ Sanitize::CSS.stylesheet(css, Sanitize::Config::RELAXED)
156152
# => %[
157153
#
158154
#
159-
#
160155
# a { text-decoration: none; }
161156
#
162157
# a:hover {
@@ -173,7 +168,6 @@ Sanitize::CSS.properties(%[
173168
#
174169
# text-decoration: underline;
175170
# ]
176-
177171
```
178172

179173
## Configuration
@@ -186,7 +180,7 @@ Allows only very simple inline markup. No links, images, or block elements.
186180

187181
```ruby
188182
Sanitize.fragment(html, Sanitize::Config::RESTRICTED)
189-
# => '<b>foo</b>'
183+
# => "<b>foo</b>"
190184
```
191185

192186
### Sanitize::Config::BASIC
@@ -215,14 +209,14 @@ If the built-in modes don't meet your needs, you can easily specify a custom con
215209

216210
```ruby
217211
Sanitize.fragment(html,
218-
:elements => ['a', 'span'],
212+
elements: ['a', 'span'],
219213

220-
:attributes => {
221-
'a' => ['href', 'title'],
214+
attributes: {
215+
'a' => ['href', 'title'],
222216
'span' => ['class']
223217
},
224218

225-
:protocols => {
219+
protocols: {
226220
'a' => {'href' => ['http', 'https', 'mailto']}
227221
}
228222
)
@@ -236,8 +230,8 @@ The built-in configs are deeply frozen to prevent people from modifying them (ei
236230
# Create a customized copy of the Basic config, adding <div> and <table> to the
237231
# existing allowlisted elements.
238232
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
239-
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
240-
:remove_contents => true
233+
elements: Sanitize::Config::BASIC[:elements] + ['div', 'table'],
234+
remove_contents: true
241235
))
242236
```
243237

@@ -246,8 +240,8 @@ The example above adds the `<div>` and `<table>` elements to a copy of the exist
246240
```ruby
247241
# Overwrite :elements instead of creating a copy with new entries.
248242
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
249-
:elements => ['div', 'table'],
250-
:remove_contents => true
243+
elements: ['div', 'table'],
244+
remove_contents: true
251245
))
252246
```
253247

@@ -258,7 +252,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
258252
Attributes to add to specific elements. If the attribute already exists, it will be replaced with the value specified here. Specify all element names and attributes in lowercase.
259253

260254
```ruby
261-
:add_attributes => {
255+
add_attributes: {
262256
'a' => {'rel' => 'nofollow'}
263257
}
264258
```
@@ -276,28 +270,28 @@ Whether or not to allow well-formed HTML doctype declarations such as "<!DOCTYPE
276270
Attributes to allow on specific elements. Specify all element names and attributes in lowercase.
277271

278272
```ruby
279-
:attributes => {
280-
'a' => ['href', 'title'],
273+
attributes: {
274+
'a' => ['href', 'title'],
281275
'blockquote' => ['cite'],
282-
'img' => ['alt', 'src', 'title']
276+
'img' => ['alt', 'src', 'title']
283277
}
284278
```
285279

286280
If you'd like to allow certain attributes on all elements, use the symbol `:all` instead of an element name.
287281

288282
```ruby
289283
# Allow the class attribute on all elements.
290-
:attributes => {
284+
attributes: {
291285
:all => ['class'],
292-
'a' => ['href', 'title']
286+
'a' => ['href', 'title']
293287
}
294288
```
295289

296290
To allow arbitrary HTML5 `data-*` attributes, use the symbol `:data` in place of an attribute name.
297291

298292
```ruby
299293
# Allow arbitrary HTML5 data-* attributes on <div> elements.
300-
:attributes => {
294+
attributes: {
301295
'div' => [:data]
302296
}
303297
```
@@ -353,7 +347,7 @@ If you'd like to allow the use of relative URLs which don't have a protocol, inc
353347
Array of HTML element names to allow. Specify all names in lowercase. Any elements not in this array will be removed.
354348

355349
```ruby
356-
:elements => %w[
350+
elements: %w[
357351
a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
358352
q s samp small strike strong sub sup time u ul var
359353
]
@@ -373,10 +367,10 @@ Array of HTML element names to allow. Specify all names in lowercase. Any elemen
373367
374368
#### :parser_options (Hash)
375369

376-
[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
370+
[Parsing options](https://nokogiri.org/tutorials/parsing_an_html5_document.html?h=parsing+options#parsing-options) to be supplied to Nokogiri.
377371

378372
```ruby
379-
:parser_options => {
373+
parser_options: {
380374
max_errors: -1,
381375
max_tree_depth: -1
382376
}
@@ -387,16 +381,16 @@ Array of HTML element names to allow. Specify all names in lowercase. Any elemen
387381
URL protocols to allow in specific attributes. If an attribute is listed here and contains a protocol other than those specified (or if it contains no protocol at all), it will be removed.
388382

389383
```ruby
390-
:protocols => {
391-
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
392-
'img' => {'src' => ['http', 'https']}
384+
protocols: {
385+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
386+
'img' => {'src' => ['http', 'https']}
393387
}
394388
```
395389

396390
If you'd like to allow the use of relative URLs which don't have a protocol, include the symbol `:relative` in the protocol array:
397391

398392
```ruby
399-
:protocols => {
393+
protocols: {
400394
'a' => {'href' => ['http', 'https', :relative]}
401395
}
402396
```
@@ -407,7 +401,7 @@ If this is `true`, Sanitize will remove the contents of any non-allowlisted elem
407401

408402
If this is an Array or Set of element names, then only the contents of the specified elements (when filtered) will be removed, and the contents of all other filtered elements will be left behind.
409403

410-
The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
404+
The default value can be seen in the [default config](lib/sanitize/config/default.rb).
411405

412406
#### :transformers (Array or callable)
413407

@@ -420,20 +414,14 @@ Hash of element names which, when removed, should have their contents surrounded
420414
Each element name is a key pointing to another Hash, which provides the specific whitespace that should be inserted `:before` and `:after` the removed element's position. The `:after` value will only be inserted if the removed element has children, in which case it will be inserted after those children.
421415

422416
```ruby
423-
:whitespace_elements => {
424-
'br' => { :before => "\n", :after => "" },
425-
'div' => { :before => "\n", :after => "\n" },
426-
'p' => { :before => "\n", :after => "\n" }
417+
whitespace_elements: {
418+
'br' => { before: "\n", after: "" },
419+
'div' => { before: "\n", after: "\n" },
420+
'p' => { before: "\n", after: "\n" }
427421
}
428422
```
429423

430-
The default elements with whitespace added before and after are:
431-
432-
```
433-
address article aside blockquote br dd div dl dt
434-
footer h1 h2 h3 h4 h5 h6 header hgroup hr li nav
435-
ol p pre section ul
436-
```
424+
The default elements with whitespace added before and after can be seen in [the default config](lib/sanitize/config/default.rb).
437425

438426
## Transformers
439427

@@ -442,7 +430,7 @@ Transformers allow you to filter and modify HTML nodes using your own custom log
442430
To use one or more transformers, pass them to the `:transformers` config setting. You may pass a single transformer or an array of transformers.
443431

444432
```ruby
445-
Sanitize.fragment(html, :transformers => [
433+
Sanitize.fragment(html, transformers: [
446434
transformer_one,
447435
transformer_two
448436
])
@@ -493,7 +481,7 @@ transformer = lambda do |env|
493481
end
494482

495483
# Prints "header", "span", "strong", "p", "footer".
496-
Sanitize.fragment(html, :transformers => transformer)
484+
Sanitize.fragment(html, transformers: transformer)
497485
```
498486

499487
Transformers have a tremendous amount of power, including the power to completely bypass Sanitize's built-in filtering. Be careful! Your safety is in your own hands.
@@ -503,20 +491,22 @@ Transformers have a tremendous amount of power, including the power to completel
503491
The following example demonstrates how to remove image elements unless they use a relative URL or are hosted on a specific domain. It assumes that the `<img>` element and its `src` attribute are already allowlisted.
504492

505493
```ruby
506-
require 'uri'
494+
require "uri"
507495

508496
image_allowlist_transformer = lambda do |env|
509497
# Ignore everything except <img> elements.
510-
return unless env[:node_name] == 'img'
498+
return unless env[:node_name] == "img"
511499

512-
node = env[:node]
513-
image_uri = URI.parse(node['src'])
500+
node = env[:node]
501+
image_uri = URI.parse(node["src"])
514502

515503
# Only allow relative URLs or URLs with the example.com domain. The
516504
# image_uri.host.nil? check ensures that protocol-relative URLs like
517-
# "//evil.com/foo.jpg".
518-
unless image_uri.host == 'example.com' || (image_uri.host.nil? && image_uri.relative?)
519-
node.unlink # `Nokogiri::XML::Node#unlink` removes a node from the document
505+
# "//evil.com/foo.jpg" are not allowed.
506+
unless image_uri.host == "example.com"
507+
unless image_uri.host.nil? && image_uri.relative?
508+
node.unlink # `Nokogiri::XML::Node#unlink` removes a node from the document
509+
end
520510
end
521511
end
522512
```
@@ -527,40 +517,40 @@ The following example demonstrates how to create a transformer that will safely
527517

528518
```ruby
529519
youtube_transformer = lambda do |env|
530-
node = env[:node]
520+
node = env[:node]
531521
node_name = env[:node_name]
532522

533523
# Don't continue if this node is already allowlisted or is not an element.
534524
return if env[:is_allowlisted] || !node.element?
535525

536526
# Don't continue unless the node is an iframe.
537-
return unless node_name == 'iframe'
527+
return unless node_name == "iframe"
538528

539529
# Verify that the video URL is actually a valid YouTube video URL.
540-
return unless node['src'] =~ %r|\A(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/|
530+
return unless %r{\A(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/}.match?(node["src"])
541531

542532
# We're now certain that this is a YouTube embed, but we still need to run
543533
# it through a special Sanitize step to ensure that no unwanted elements or
544534
# attributes that don't belong in a YouTube embed can sneak in.
545535
Sanitize.node!(node, {
546-
:elements => %w[iframe],
536+
elements: %w[iframe],
547537

548-
:attributes => {
549-
'iframe' => %w[allowfullscreen frameborder height src width]
538+
attributes: {
539+
"iframe" => %w[allowfullscreen frameborder height src width]
550540
}
551541
})
552542

553543
# Now that we're sure that this is a valid YouTube embed and that there are
554544
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
555545
# to allowlist the current node.
556-
{:node_allowlist => [node]}
546+
{node_allowlist: [node]}
557547
end
558548

559549
html = %[
560550
<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ"
561551
frameborder="0" allowfullscreen></iframe>
562-
]
552+
].strip
563553

564-
Sanitize.fragment(html, :transformers => youtube_transformer)
554+
Sanitize.fragment(html, transformers: youtube_transformer)
565555
# => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
566556
```

0 commit comments

Comments
 (0)