Skip to content

fix(ruby) symbols, string interpolation, class names with underscores #4213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## Version 11.11.2

Core Grammars:

- fix(ruby) symbols, string interpolation, class names with underscores


## Version 11.11.1

- Fixes regression with Rust grammar.
Expand Down
151 changes: 104 additions & 47 deletions src/languages/ruby.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,7 @@ export default function(hljs) {
const regex = hljs.regex;
const RUBY_METHOD_RE = '([a-zA-Z_]\\w*[!?=]?|[-+~]@|<<|>>|=~|===?|<=>|[<>]=?|\\*\\*|[-/+%^&*~`|]|\\[\\]=?)';
// TODO: move concepts like CAMEL_CASE into `modes.js`
const CLASS_NAME_RE = regex.either(
/\b([A-Z]+[a-z0-9]+)+/,
// ends in caps
/\b([A-Z]+[a-z0-9]+)+[A-Z]+/,
)
;
const CLASS_NAME_WITH_NAMESPACE_RE = regex.concat(CLASS_NAME_RE, /(::\w+)*/)
const CLASS_NAME_RE = /\b([A-Z]+[a-z0-9_]+)+[A-Z]*/;
Comment on lines -20 to +14
Copy link
Member

@joshgoebel joshgoebel Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this removal break (at least) the inheritance highlighting?

Copy link
Author

@jimtng jimtng Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example syntax of what you mean?

Class::With::Namespace

will work fine and better, because it doesn't highlight the :: much like it doesn't here in github, or vscode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class C::E < A::B
end

We have special rules to match syntax where we KNOW an identifier is a class (or class with namespaces).

it doesn't here in github, or vscode

Striving for exact matches against other highlighting tools isn't a thing we care about here.


In this case I'm sympathetic to not highlighting the ::, but only if it can be done without adding a lot of complexity or advanced parsing to the grammar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really want something like [class](::[class])* but our multi-match engine can't really handle that - with discrete coloring of the different pieces. Maybe if you tried a long sequence with some items that matched 0 length, but I'm not sure I've tested multi-matching in that type of scenario before.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case would've worked fine, except it didn't recognise single-letter classes. I've adjusted it to do so.

class C::E < A::B
end
image

Copy link
Member

@joshgoebel joshgoebel Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use "show the structure". as you develop. If you're matching all those "one off" with just your class rule then you're missing out on the fact that the portion after the < is not just a title.class, it's a title.class.inherited.

This is why the simple multi-match rules exist - to flesh out more detail as well as handle cases like class A where we KNOW A is a class here, not a constant (as it would be with non other context).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of the different handling for title.class.inherited.

Whilst special casing class A works using a multi-match rule, it wouldn't help for A.method(foo) which will still be matched as a constant. So I'd suggest we simplify and not handle the class A case at all.

In the current version, it doesn't work either: https://highlightjs.org/demo#lang=ruby&v=1&theme=atom-one-dark&code=Y2xhc3MgQQplbmQKCscNOjpCywsgPCBDOjpExxJGb286OkZvbyA8yQvKGsYVyEjEC8VXQSA9ICdYJw%3D%3D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest commit here, it looks like this

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

// very popular ruby built-ins that one might even assume
// are actual keywords (despite that not being the case)
const PSEUDO_KWS = [
Expand Down Expand Up @@ -120,19 +114,16 @@ export default function(hljs) {
className: 'subst',
begin: /#\{/,
end: /\}/,
keywords: RUBY_KEYWORDS
keywords: RUBY_KEYWORDS,
relevance: 10
};
const STRING = {
const STRING_INTERPOLABLE = {
className: 'string',
contains: [
hljs.BACKSLASH_ESCAPE,
SUBST
],
variants: [
{
begin: /'/,
end: /'/
},
{
begin: /"/,
end: /"/
Expand All @@ -142,45 +133,45 @@ export default function(hljs) {
end: /`/
},
{
begin: /%[qQwWx]?\(/,
end: /\)/
begin: /%[QWx]?\(/,
end: /\)/,
relevance: 2
},
{
begin: /%[qQwWx]?\[/,
end: /\]/
begin: /%[QWx]?\[/,
end: /\]/,
relevance: 2
},
{
begin: /%[qQwWx]?\{/,
end: /\}/
begin: /%[QWx]?\{/,
end: /\}/,
relevance: 2
},
{
begin: /%[qQwWx]?</,
end: />/
begin: /%[QWx]?</,
end: />/,
relevance: 2
},
{
begin: /%[qQwWx]?\//,
end: /\//
begin: /%[QWx]?\//,
end: /\//,
relevance: 2
},
{
begin: /%[qQwWx]?%/,
end: /%/
begin: /%[QWx]?%/,
end: /%/,
relevance: 2
},
{
begin: /%[qQwWx]?-/,
end: /-/
begin: /%[QWx]?-/,
end: /-/,
relevance: 2
},
{
begin: /%[qQwWx]?\|/,
end: /\|/
begin: /%[QWx]?\|/,
end: /\|/,
relevance: 2
},
// in the following expressions, \B in the beginning suppresses recognition of ?-sequences
// where ? is the last character of a preceding identifier, as in: `func?4`
{ begin: /\B\?(\\\d{1,3})/ },
{ begin: /\B\?(\\x[A-Fa-f0-9]{1,2})/ },
{ begin: /\B\?(\\u\{?[A-Fa-f0-9]{1,6}\}?)/ },
{ begin: /\B\?(\\M-\\C-|\\M-\\c|\\c\\M-|\\M-|\\C-\\M-)[\x20-\x7e]/ },
{ begin: /\B\?\\(c|C-)[\x20-\x7e]/ },
{ begin: /\B\?\\?\S/ },
// heredocs
{
// this guard makes sure that we have an entire heredoc and not a false
Expand All @@ -202,6 +193,63 @@ export default function(hljs) {
}
]
};
const STRING_NONINTERPOLABLE = {
className: 'string',
variants: [
{
begin: /'/,
end: /'/
},
{
begin: /%[qw]?\(/,
end: /\)/,
relevance: 2
},
{
begin: /%[qw]?\[/,
end: /\]/,
relevance: 2
},
{
begin: /%[qw]?\{/,
end: /\}/,
relevance: 2
},
{
begin: /%[qw]?</,
end: />/,
relevance: 2
},
{
begin: /%[qw]?\//,
end: /\//,
relevance: 2
},
{
begin: /%[qw]?%/,
end: /%/,
relevance: 2
},
{
begin: /%[qw]?-/,
end: /-/,
relevance: 2
},
{
begin: /%[qw]?\|/,
end: /\|/,
relevance: 2
},
// in the following expressions, \B in the beginning suppresses recognition of ?-sequences
// where ? is the last character of a preceding identifier, as in: `func?4`
{ begin: /\B\?(\\\d{1,3})/ },
{ begin: /\B\?(\\x[A-Fa-f0-9]{1,2})/ },
{ begin: /\B\?(\\u\{?[A-Fa-f0-9]{1,6}\}?)/ },
{ begin: /\B\?(\\M-\\C-|\\M-\\c|\\c\\M-|\\M-|\\C-\\M-)[\x20-\x7e]/ },
{ begin: /\B\?\\(c|C-)[\x20-\x7e]/ },
{ begin: /\B\?\\?\S/ }
]
};

// Ruby syntax is underdocumented, but this grammar seems to be accurate
// as of version 2.7.2 (confirmed with (irb and `Ripper.sexp(...)`)
Expand Down Expand Up @@ -246,7 +294,7 @@ export default function(hljs) {
const INCLUDE_EXTEND = {
match: [
/(include|extend)\s+/,
CLASS_NAME_WITH_NAMESPACE_RE
CLASS_NAME_RE
],
scope: {
2: "title.class"
Expand All @@ -259,15 +307,15 @@ export default function(hljs) {
{
match: [
/class\s+/,
CLASS_NAME_WITH_NAMESPACE_RE,
CLASS_NAME_RE,
/\s+<\s+/,
CLASS_NAME_WITH_NAMESPACE_RE
CLASS_NAME_RE
]
},
{
match: [
/\b(class|module)\s+/,
CLASS_NAME_WITH_NAMESPACE_RE
CLASS_NAME_RE
]
}
],
Expand Down Expand Up @@ -301,7 +349,7 @@ export default function(hljs) {
const OBJECT_CREATION = {
relevance: 0,
match: [
CLASS_NAME_WITH_NAMESPACE_RE,
CLASS_NAME_RE,
/\.new[. (]/
],
scope: {
Expand All @@ -317,7 +365,8 @@ export default function(hljs) {
};

const RUBY_DEFAULT_CONTAINS = [
STRING,
STRING_INTERPOLABLE,
STRING_NONINTERPOLABLE,
CLASS_DEFINITION,
INCLUDE_EXTEND,
OBJECT_CREATION,
Expand All @@ -326,7 +375,8 @@ export default function(hljs) {
METHOD_DEFINITION,
{
// swallow namespace qualifiers before symbols
begin: hljs.IDENT_RE + '::' },
begin: '::'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this works.

},
{
className: 'symbol',
begin: hljs.UNDERSCORE_IDENT_RE + '(!|\\?)?:',
Expand All @@ -336,10 +386,17 @@ export default function(hljs) {
className: 'symbol',
begin: ':(?!\\s)',
contains: [
STRING,
{ begin: RUBY_METHOD_RE }
{ begin: /'/, end: /'/ },
{
begin: /"/, end: /"/,
contains: [
hljs.BACKSLASH_ESCAPE,
SUBST
]
},
{ begin: hljs.UNDERSCORE_IDENT_RE }
],
relevance: 0
relevance: 1
},
NUMBER,
{
Expand Down
7 changes: 7 additions & 0 deletions test/markup/ruby/classes.expect.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<span class="hljs-title class_">Class</span>
<span class="hljs-title class_">ClassName</span>
<span class="hljs-title class_">Class_Name</span>
<span class="hljs-title class_">ClassNAME</span>
<span class="hljs-title class_">ClassName</span>::<span class="hljs-title class_">With</span>::<span class="hljs-title class_">Namespace</span>
<span class="hljs-title class_">ClassName</span>::<span class="hljs-title class_">With</span>.method
::<span class="hljs-title class_">TopLevel</span>::<span class="hljs-title class_">Class</span>
7 changes: 7 additions & 0 deletions test/markup/ruby/classes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Class
ClassName
Class_Name
ClassNAME
ClassName::With::Namespace
ClassName::With.method
::TopLevel::Class
26 changes: 25 additions & 1 deletion test/markup/ruby/strings.expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,28 @@ c = <span class="hljs-string">?\u{00AF09}</span>
c = <span class="hljs-string">?\u{0AF09}</span>
c = <span class="hljs-string">?\u{AF9}</span>
c = <span class="hljs-string">?\u{F9}</span>
c = <span class="hljs-string">?\u{F}</span>
c = <span class="hljs-string">?\u{F}</span>

<span class="hljs-comment"># Interpolable Strings</span>
<span class="hljs-string">&quot;string&quot;</span>
<span class="hljs-string">&quot;string <span class="hljs-subst">#{var}</span>&quot;</span>
<span class="hljs-string">`string`</span>
<span class="hljs-string">`string <span class="hljs-subst">#{var}</span>`</span>
<span class="hljs-string">%W[foo bar]</span>
<span class="hljs-string">%W[foo bar <span class="hljs-subst">#{var}</span>]</span>
<span class="hljs-string">%Q[foo bar]</span>
<span class="hljs-string">%Q[foo bar <span class="hljs-subst">#{var}</span>]</span>
<span class="hljs-string">%x[foo]</span>
<span class="hljs-string">%x[foo <span class="hljs-subst">#{var}</span>]</span>
<span class="hljs-string">&lt;&lt;~DOC
Multiline heredoc
Text <span class="hljs-subst">#{var}</span>
DOC</span>

<span class="hljs-comment"># Non-interpolable Strings</span>
<span class="hljs-string">&#x27;string&#x27;</span>
<span class="hljs-string">&#x27;string #{var}&#x27;</span>
<span class="hljs-string">%q[foo]</span>
<span class="hljs-string">%q[foo #{var}]</span>
<span class="hljs-string">%w[foo]</span>
<span class="hljs-string">%w[foo #{var}]</span>
26 changes: 25 additions & 1 deletion test/markup/ruby/strings.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,28 @@ c = ?\u{00AF09}
c = ?\u{0AF09}
c = ?\u{AF9}
c = ?\u{F9}
c = ?\u{F}
c = ?\u{F}

# Interpolable Strings
"string"
"string #{var}"
`string`
`string #{var}`
%W[foo bar]
%W[foo bar #{var}]
%Q[foo bar]
%Q[foo bar #{var}]
%x[foo]
%x[foo #{var}]
<<~DOC
Multiline heredoc
Text #{var}
DOC

# Non-interpolable Strings
'string'
'string #{var}'
%q[foo]
%q[foo #{var}]
%w[foo]
%w[foo #{var}]
22 changes: 22 additions & 0 deletions test/markup/ruby/symbols.expect.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<span class="hljs-symbol">:symbol</span>
<span class="hljs-symbol">:Symbol</span>
<span class="hljs-symbol">:_leading</span>
<span class="hljs-symbol">:trailing_</span>
<span class="hljs-symbol">:contains_underscore</span>
<span class="hljs-symbol">:symbol_CAPS</span>
<span class="hljs-symbol">:&quot;string symbol&quot;</span>
<span class="hljs-symbol">:&quot;interpolated <span class="hljs-subst">#{test}</span>&quot;</span>
<span class="hljs-symbol">:&#x27;string symbol&#x27;</span>
<span class="hljs-symbol">:&#x27;not interpolated #{test}&#x27;</span>
method <span class="hljs-symbol">:symbol</span>
method(<span class="hljs-symbol">:symbol</span>)
method(&amp;<span class="hljs-symbol">:symbol</span>)
assign=<span class="hljs-symbol">:symbol</span>
assign = <span class="hljs-symbol">:symbol</span>
<span class="hljs-symbol">:symbol</span>, others
<span class="hljs-symbol">:</span>1notasymbol
<span class="hljs-symbol">:</span><span class="hljs-string">%q[notasymbol]</span>

::notsymbol

<span class="hljs-symbol">hash_symbol:</span> value
22 changes: 22 additions & 0 deletions test/markup/ruby/symbols.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
:symbol
:Symbol
:_leading
:trailing_
:contains_underscore
:symbol_CAPS
:"string symbol"
:"interpolated #{test}"
:'string symbol'
:'not interpolated #{test}'
method :symbol
method(:symbol)
method(&:symbol)
assign=:symbol
assign = :symbol
:symbol, others
:1notasymbol
:%q[notasymbol]

::notsymbol

hash_symbol: value