Skip to content

Commit ee7df7b

Browse files
cipherboytechknowlogick
authored andcommitted
Markdown: Sanitizier Configuration (#9075)
* Support custom sanitization policy Allowing the gitea administrator to configure sanitization policy allows them to couple external renders and custom templates to support more markup. In particular, the `pandoc` renderer allows generating KaTeX annotations, wrapping them in `<span>` elements with class `math` and either `inline` or `display` (depending on whether or not inline or block mode was requested). This iteration gives the administrator whitelisting powers; carefully crafted regexes will thus let through only the desired attributes necessary to support their custom markup. Resolves: #9054 Signed-off-by: Alexander Scheel <[email protected]> * Document new sanitization configuration - Adds basic documentation to app.ini.sample, - Adds an example to the Configuration Cheat Sheet, and - Adds extended information to External Renderers section. Signed-off-by: Alexander Scheel <[email protected]> * Drop extraneous length check in newMarkupSanitizer(...) Signed-off-by: Alexander Scheel <[email protected]> * Fix plural ELEMENT and ALLOW_ATTR in docs These were left over from their initial names. Make them singular to conform with the current expectations. Signed-off-by: Alexander Scheel <[email protected]>
1 parent cecc319 commit ee7df7b

File tree

5 files changed

+148
-22
lines changed

5 files changed

+148
-22
lines changed

custom/conf/app.ini.sample

+6
Original file line numberDiff line numberDiff line change
@@ -877,6 +877,12 @@ SHOW_FOOTER_VERSION = true
877877
; Show template execution time in the footer
878878
SHOW_FOOTER_TEMPLATE_LOAD_TIME = true
879879

880+
[markup.sanitizer]
881+
; The following keys can be used multiple times to define sanitation policy rules.
882+
;ELEMENT = span
883+
;ALLOW_ATTR = class
884+
;REGEXP = ^(info|warning|error)$
885+
880886
[markup.asciidoc]
881887
ENABLED = false
882888
; List of file extensions that should be rendered by an external command

docs/content/doc/advanced/config-cheat-sheet.en-us.md

+18
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,24 @@ Two special environment variables are passed to the render command:
578578
- `GITEA_PREFIX_SRC`, which contains the current URL prefix in the `src` path tree. To be used as prefix for links.
579579
- `GITEA_PREFIX_RAW`, which contains the current URL prefix in the `raw` path tree. To be used as prefix for image paths.
580580

581+
582+
Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc.
583+
584+
```ini
585+
[markup.sanitizer]
586+
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
587+
; with "inline" or "display" classes depending on context.
588+
ELEMENT = span
589+
ALLOW_ATTR = class
590+
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
591+
```
592+
593+
- `ELEMENT`: The element this policy applies to. Must be non-empty.
594+
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
595+
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.
596+
597+
You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry.
598+
581599
## Time (`time`)
582600

583601
- `FORMAT`: Time format to diplay on UI. i.e. RFC1123 or 2006-01-02 15:04:05

docs/content/doc/advanced/external-renderers.en-us.md

+18
Original file line numberDiff line numberDiff line change
@@ -68,4 +68,22 @@ RENDER_COMMAND = rst2html.py
6868
IS_INPUT_FILE = false
6969
```
7070

71+
If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/).
72+
73+
```ini
74+
[markup.sanitizer]
75+
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
76+
; with "inline" or "display" classes depending on context.
77+
ELEMENT = span
78+
ALLOW_ATTR = class
79+
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
80+
81+
[markup.markdown]
82+
ENABLED = true
83+
FILE_EXTENSIONS = .md,.markdown
84+
RENDER_COMMAND = pandoc -f markdown -t html --katex
85+
```
86+
87+
You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. All three must be defined, but `REGEXP` may be blank to allow unconditional whitelisting of that attribute.
88+
7189
Once your configuration changes have been made, restart Gitea to have changes take effect.

modules/markup/sanitizer.go

+9
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,15 @@ func ReplaceSanitizer() {
5050

5151
// Allow <kbd> tags for keyboard shortcut styling
5252
sanitizer.policy.AllowElements("kbd")
53+
54+
// Custom keyword markup
55+
for _, rule := range setting.ExternalSanitizerRules {
56+
if rule.Regexp != nil {
57+
sanitizer.policy.AllowAttrs(rule.AllowAttr).Matching(rule.Regexp).OnElements(rule.Element)
58+
} else {
59+
sanitizer.policy.AllowAttrs(rule.AllowAttr).OnElements(rule.Element)
60+
}
61+
}
5362
}
5463

5564
// Sanitize takes a string that contains a HTML fragment or document and applies policy whitelist.

modules/setting/markup.go

+97-22
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,14 @@ import (
99
"strings"
1010

1111
"code.gitea.io/gitea/modules/log"
12+
13+
"gopkg.in/ini.v1"
1214
)
1315

1416
// ExternalMarkupParsers represents the external markup parsers
1517
var (
16-
ExternalMarkupParsers []MarkupParser
18+
ExternalMarkupParsers []MarkupParser
19+
ExternalSanitizerRules []MarkupSanitizerRule
1720
)
1821

1922
// MarkupParser defines the external parser configured in ini
@@ -25,42 +28,114 @@ type MarkupParser struct {
2528
IsInputFile bool
2629
}
2730

31+
// MarkupSanitizerRule defines the policy for whitelisting attributes on
32+
// certain elements.
33+
type MarkupSanitizerRule struct {
34+
Element string
35+
AllowAttr string
36+
Regexp *regexp.Regexp
37+
}
38+
2839
func newMarkup() {
29-
extensionReg := regexp.MustCompile(`\.\w`)
3040
for _, sec := range Cfg.Section("markup").ChildSections() {
3141
name := strings.TrimPrefix(sec.Name(), "markup.")
3242
if name == "" {
3343
log.Warn("name is empty, markup " + sec.Name() + "ignored")
3444
continue
3545
}
3646

37-
extensions := sec.Key("FILE_EXTENSIONS").Strings(",")
38-
var exts = make([]string, 0, len(extensions))
39-
for _, extension := range extensions {
40-
if !extensionReg.MatchString(extension) {
41-
log.Warn(sec.Name() + " file extension " + extension + " is invalid. Extension ignored")
42-
} else {
43-
exts = append(exts, extension)
44-
}
47+
if name == "sanitizer" {
48+
newMarkupSanitizer(name, sec)
49+
} else {
50+
newMarkupRenderer(name, sec)
4551
}
52+
}
53+
}
54+
55+
func newMarkupSanitizer(name string, sec *ini.Section) {
56+
haveElement := sec.HasKey("ELEMENT")
57+
haveAttr := sec.HasKey("ALLOW_ATTR")
58+
haveRegexp := sec.HasKey("REGEXP")
59+
60+
if !haveElement && !haveAttr && !haveRegexp {
61+
log.Warn("Skipping empty section: markup.%s.", name)
62+
return
63+
}
64+
65+
if !haveElement || !haveAttr || !haveRegexp {
66+
log.Error("Missing required keys from markup.%s. Must have all three of ELEMENT, ALLOW_ATTR, and REGEXP defined!", name)
67+
return
68+
}
69+
70+
elements := sec.Key("ELEMENT").ValueWithShadows()
71+
allowAttrs := sec.Key("ALLOW_ATTR").ValueWithShadows()
72+
regexps := sec.Key("REGEXP").ValueWithShadows()
73+
74+
if len(elements) != len(allowAttrs) ||
75+
len(elements) != len(regexps) {
76+
log.Error("All three keys in markup.%s (ELEMENT, ALLOW_ATTR, REGEXP) must be defined the same number of times! Got %d, %d, and %d respectively.", name, len(elements), len(allowAttrs), len(regexps))
77+
return
78+
}
4679

47-
if len(exts) == 0 {
48-
log.Warn(sec.Name() + " file extension is empty, markup " + name + " ignored")
80+
ExternalSanitizerRules = make([]MarkupSanitizerRule, 0, len(elements))
81+
82+
for index, pattern := range regexps {
83+
if pattern == "" {
84+
rule := MarkupSanitizerRule{
85+
Element: elements[index],
86+
AllowAttr: allowAttrs[index],
87+
Regexp: nil,
88+
}
89+
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
4990
continue
5091
}
5192

52-
command := sec.Key("RENDER_COMMAND").MustString("")
53-
if command == "" {
54-
log.Warn(" RENDER_COMMAND is empty, markup " + name + " ignored")
93+
// Validate when parsing the config that this is a valid regular
94+
// expression. Then we can use regexp.MustCompile(...) later.
95+
compiled, err := regexp.Compile(pattern)
96+
if err != nil {
97+
log.Error("In module.%s: REGEXP at definition %d failed to compile: %v", name, index+1, err)
5598
continue
5699
}
57100

58-
ExternalMarkupParsers = append(ExternalMarkupParsers, MarkupParser{
59-
Enabled: sec.Key("ENABLED").MustBool(false),
60-
MarkupName: name,
61-
FileExtensions: exts,
62-
Command: command,
63-
IsInputFile: sec.Key("IS_INPUT_FILE").MustBool(false),
64-
})
101+
rule := MarkupSanitizerRule{
102+
Element: elements[index],
103+
AllowAttr: allowAttrs[index],
104+
Regexp: compiled,
105+
}
106+
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
107+
}
108+
}
109+
110+
func newMarkupRenderer(name string, sec *ini.Section) {
111+
extensionReg := regexp.MustCompile(`\.\w`)
112+
113+
extensions := sec.Key("FILE_EXTENSIONS").Strings(",")
114+
var exts = make([]string, 0, len(extensions))
115+
for _, extension := range extensions {
116+
if !extensionReg.MatchString(extension) {
117+
log.Warn(sec.Name() + " file extension " + extension + " is invalid. Extension ignored")
118+
} else {
119+
exts = append(exts, extension)
120+
}
121+
}
122+
123+
if len(exts) == 0 {
124+
log.Warn(sec.Name() + " file extension is empty, markup " + name + " ignored")
125+
return
65126
}
127+
128+
command := sec.Key("RENDER_COMMAND").MustString("")
129+
if command == "" {
130+
log.Warn(" RENDER_COMMAND is empty, markup " + name + " ignored")
131+
return
132+
}
133+
134+
ExternalMarkupParsers = append(ExternalMarkupParsers, MarkupParser{
135+
Enabled: sec.Key("ENABLED").MustBool(false),
136+
MarkupName: name,
137+
FileExtensions: exts,
138+
Command: command,
139+
IsInputFile: sec.Key("IS_INPUT_FILE").MustBool(false),
140+
})
66141
}

0 commit comments

Comments
 (0)