Skip to content

Commit f6c17dc

Browse files
committed
Introduce Chroma syntax-generation script.
Hugo uses Chroma for syntax highlighting. We upstreamed a Materialize-specific syntax to ensure our dialect's keywords are recognized as such. Now that the end-to-end change is in prod, let's automate the process of generating updates for the dialect lexer.
1 parent ee7ee5f commit f6c17dc

File tree

3 files changed

+108
-0
lines changed

3 files changed

+108
-0
lines changed

bin/gen-chroma-syntax

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright Materialize, Inc. and contributors. All rights reserved.
4+
#
5+
# Use of this software is governed by the Business Source License
6+
# included in the LICENSE file at the root of this repository.
7+
#
8+
# As of the Change Date specified in that file, in accordance with
9+
# the Business Source License, use of this software will be governed
10+
# by the Apache License, Version 2.0.
11+
#
12+
# gen-chroma-syntax -- regenerates a Materialize-dialect Chroma syntax file
13+
# using the currently-checked out Materialize keywords
14+
15+
exec "$(dirname "$0")"/pyactivate -m materialize.cli.gen-chroma-syntax "$@"
+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Generating new Chroma syntax highlights
2+
3+
Chroma is the syntax highlighter used by Hugo, the static site generator that powers Materialize's docs. We have upstreamed a Materialize lexer (which is a slightly-modified version of their Postgres lexer). When new keywords are added we should upstream an update.
4+
5+
## Generating a new lexer definition
6+
7+
1. Fork the Chroma repo and clone it locally as a sibling of the `materialize` repo.
8+
2. From the root directory of the `materialize` repo, run the generate script:
9+
```shell
10+
./bin/gen-chroma-syntax
11+
```
12+
3. In the Chroma repo, commit the changes to the Materialize dialect file and submit them as a PR.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
#!/usr/bin/env python3
2+
3+
# Copyright Materialize, Inc. and contributors. All rights reserved.
4+
#
5+
# Use of this software is governed by the Business Source License
6+
# included in the LICENSE file at the root of this repository.
7+
#
8+
# As of the Change Date specified in that file, in accordance with
9+
# the Business Source License, use of this software will be governed
10+
# by the Apache License, Version 2.0.
11+
12+
"""Regenerates a Materialize-dialect Chroma syntax file using the local Materialize keywords"""
13+
14+
import argparse
15+
import xml.etree.ElementTree as ET
16+
from pathlib import Path
17+
18+
from materialize import MZ_ROOT
19+
20+
CONFIG_FIELDS = {
21+
"name": "Materialize SQL dialect",
22+
"alias": ["materialize", "mzsql"],
23+
"mime_type": "text/x-materializesql",
24+
}
25+
26+
27+
def keyword_pattern():
28+
keywords_file = MZ_ROOT / "src/sql-lexer/src/keywords.txt"
29+
keywords = [
30+
line.upper()
31+
for line in keywords_file.read_text().splitlines()
32+
if not (line.startswith("#") or not line.strip())
33+
]
34+
return f"({'|'.join(keywords)})\\b"
35+
36+
37+
def set_keywords(root: ET.Element):
38+
rule = root.find(".//rule/token[@type='Keyword']/..")
39+
if not rule:
40+
raise RuntimeError("No keyword rule found")
41+
rule.set("pattern", keyword_pattern())
42+
43+
44+
def set_config(root: ET.Element):
45+
config = root.find("config")
46+
if not config:
47+
raise RuntimeError("No config found")
48+
for field_name, field_value in CONFIG_FIELDS.items():
49+
if isinstance(field_value, list):
50+
for element in config.findall(field_name):
51+
config.remove(element)
52+
for item in field_value:
53+
field = ET.SubElement(config, field_name)
54+
field.text = item
55+
else:
56+
field = config.find(field_name)
57+
if field is None:
58+
raise RuntimeError(f"No such config field: '{field_name}'")
59+
field.text = field_value
60+
61+
62+
def main() -> None:
63+
parser = argparse.ArgumentParser()
64+
parser.add_argument(
65+
"--chroma-dir",
66+
default="../chroma",
67+
)
68+
args = parser.parse_args()
69+
lexer_dir = Path(f"{args.chroma_dir}/lexers/embedded/")
70+
tree = ET.parse(lexer_dir / "postgresql_sql_dialect.xml")
71+
root = tree.getroot()
72+
if not root:
73+
raise RuntimeError("Could not find root element")
74+
set_keywords(root)
75+
set_config(root)
76+
ET.indent(root, " ")
77+
tree.write(lexer_dir / "materialize_sql_dialect.xml", encoding="unicode")
78+
79+
80+
if __name__ == "__main__":
81+
main()

0 commit comments

Comments
 (0)