Skip to content

Support semantic tokens protocol #1721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 29, 2020
Merged

Support semantic tokens protocol #1721

merged 12 commits into from
May 29, 2020

Conversation

sebastiansturm
Copy link
Contributor

this PR adds support for the semantic tokens protocol as defined by LSP 3.16, see #1482.

I've been using commit 9c03646... for about 4 weeks with a trunk build of clangd 11. Today's commits have only received superficial testing but should be relatively unlikely to break things.

The following points have not been addressed yet:

  1. I have assumed that there is at most one language server per buffer to provide semantic tokens. Adding support for multiple providers shouldn't require much effort but also doesn't make much sense if there is no use case. If you know of one, please let me know so I can use that to test an extended implementation.
  2. This PR no longer uses overlays for semantic highlighting (at least not when the :semantic-tokens option is used) but instead suspends regular font-locking, waits for semantic tokens to arrive from the language server, afterwards applies regular font locking and then applies semantic tokens on top of that. If the server is slow to respond, this will delay initial fontification, so if people think this is annoying the first fontification run could be performed immediately. Also, if semantic highlighting faces overlap with regular font-locking, semantic highlighting will take precedence; this could be changed to a true layering approach by replacing put-text-property with add-text-properties. Finally, some language servers might provide sufficiently rich highlighting data to completely replace the font-locking mechanism provided by the underlying major mode. If so, I could add an option to do that.
  3. Unfortunately, clangd doesn't support range-based incremental highlighting (yet?). It does support semantic token edits though I'm not sure how much of a performance win that will be. I plan to implement it anyway and run some benchmarks, though I'd rather do range-based highlighting first. Any recommendations for a language server with robust range-based semanticTokens support?

@yyoncho
Copy link
Member

yyoncho commented May 24, 2020

3. Any recommendations for a language server with robust range-based semanticTokens support?

I guess rust analyzer might support it, @flodiebold?

@sebastiansturm

You have several unused variables which cause CI failure: https://travis-ci.org/github/emacs-lsp/lsp-mode/jobs/690604604#L1473

Copy link
Member

@yyoncho yyoncho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I would propose merging the PR as it is with semantics disabled. Once it gets
enough testing I propose having it enabled by default and deleting the rest of
the highlighting related code which apparently won't go into the spec.

Up to now the following servers support semantic highlighting.

  1. Rust Analyzer - latest version
  2. clangd - latest version build from master branch
  3. Dart (?) cc @ericdallo

I propose also adding some visual indication that the syntax highlighting is
about to be applied when we are waiting for the server to respond otherwise it
looks a bit broken. If we agree on that, this could be addressed in a separate PR.

PS: AFAICS Rust Analyzer supports range provider

 |-[-] semanticTokensProvider:
 |  |-[X] rangeProvider: t

lsp-mode.el Outdated
(defvar-local lsp--semantic-tokens-teardown nil)

(defun lsp--semantic-tokens-initialize-buffer ()
(message "semantic-tokens-initialize-buffer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

@tschuett
Copy link
Contributor

tschuett commented May 24, 2020

Would you mind adding instructions to the README.md on how to select the different modes? Thanks.

@ericdallo
Copy link
Member

Currently, Dart SDK doesn't have support for semantic tokens protocol yet, but since last year they are migrating for their own server protocol to LSP, I hope they implement this soon, so I opened an issue asking for that support.

@sebastiansturm
Copy link
Contributor Author

Would you mind adding instructions to the README.md on how to select the different modes? Thanks.

would it be enough to mention there's a (documented) option called lsp-semantic-highlighting? I wouldn't want to bog down the otherwise brief overview section with too much text.

@sebastiansturm
Copy link
Contributor Author

Looks good to me.

I would propose merging the PR as it is with semantics disabled. Once it gets
enough testing I propose having it enabled by default and deleting the rest of
the highlighting related code which apparently won't go into the spec.

the sooner we can get rid of the old highlighting code, the better. Though I'd wait at least until clangd 11 is released, as clangd 10 only supports the Theia protocol

Up to now the following servers support semantic highlighting.

1. Rust Analyzer - latest version

2. clangd - latest version build from master branch

3. Dart (?) cc @ericdallo

I propose also adding some visual indication that the syntax highlighting is
about to be applied when we are waiting for the server to respond otherwise it
looks a bit broken. If we agree on that, this could be addressed in a separate PR.

good idea. I think with clangd the delay is short enough to render this unnecessary, but other language servers may not be as quick

PS: AFAICS Rust Analyzer supports range provider

 |-[-] semanticTokensProvider:
 |  |-[X] rangeProvider: t

nice. I'll clone something written in Rust then and try to set up rust-analyzer

@tschuett
Copy link
Contributor

No worries.

@yyoncho
Copy link
Member

yyoncho commented May 25, 2020

but other language servers may not be as quick

From what I saw with rust analyzer initially it may take a lot of time because even that I was using a hello world application to test the delay was visible.

@yyoncho yyoncho merged commit 63105c7 into emacs-lsp:master May 29, 2020
Copy link

@nsemrau nsemrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note about a twice defined defface.

"Face used for types."
:group 'lsp-faces)

(defface lsp-face-semhl-type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional to define this defface twice, identical to to one just above it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, wasn't intentional, thanks for the pointer. I assume this doesn't break anything (or does it?), so I'll just fix this as part of the incremental highlighting PR (have rust-analyzer up and running and incremental highlighting seems to work, but it currently doesn't update correctly under all circumstances; plan to work on it on Sunday evening or Monday)

@dgutov
Copy link
Contributor

dgutov commented Jun 1, 2020

Have you tried using the font-lock-face text property? It might make the font-lock removal/re-application dance unnecessary.

@sebastiansturm
Copy link
Contributor Author

Have you tried using the font-lock-face text property? It might make the font-lock removal/re-application dance unnecessary.

no, so far I haven't. Didn't even know about it, so thanks for the hint! As far as I understand, I could just apply font-lock-face properties every time a new token set arrives (presumably after cleaning existing font-lock-face properties within the region to be fontified?) so font-lock would pick up those faces automatically, and otherwise leave font locking as it is? If so, I do like that approach for being much cleaner than what I have now.
On the other hand though, it seems to me I could then no longer suspend refontification until the new tokens have arrived, in which case fontification would be performed twice (once with regular font locking + old tokens, then with regular font locking + new tokens). Is that true, or did I miss something?
Also, I wonder how different nonstandard fontification mechanisms would work together if we were to use font-lock-face. If another mechanism wanted to provide fontification on top of regular font-locking, how would that other mechanism be chained together with lsp's semantic tokens? (Though even if there was no convenient way to do that with font-lock-face, I can only think of treesitter as an alternative mechanism and I assume treesitter would just replace regular font locking wholesale, so that would be a nonissue. Just wanted to bring it up to make sure I'm not missing something here)

@dgutov
Copy link
Contributor

dgutov commented Jun 2, 2020

(presumably after cleaning existing font-lock-face properties within the region to be fontified?)

Yup. You can follow js2-mode's example: https://github.com/mooz/js2-mode/blob/master/js2-mode.el#L6641-L6657. Its highlighting doesn't use font-lock either.

On the other hand though, it seems to me I could then no longer suspend refontification until the new tokens have arrived, in which case fontification would be performed twice (once with regular font locking + old tokens, then with regular font locking + new tokens).

I suppose it is true. On the other hand, you only need to clear existing highlighting when you receive the response from the server. So this approach can work without blinking, or similar visual aggravations.

Also, I wonder how different nonstandard fontification mechanisms would work together if we were to use font-lock-face.

Multiple non-standard ones? I don't know (though there are some things you could try). But your current approach doesn't offer any solutions for that either, does it?

OTOH, the suggested approach is very compatible with whitespace-mode, for instance.

@sebastiansturm
Copy link
Contributor Author

(presumably after cleaning existing font-lock-face properties within the region to be fontified?)

Yup. You can follow js2-mode's example: https://github.com/mooz/js2-mode/blob/master/js2-mode.el#L6641-L6657. Its highlighting doesn't use font-lock either.

thank you, I'll have a look at it. Will put this point (face vs font-lock-face) up for discussion in the ranged semantic tokens PR

On the other hand though, it seems to me I could then no longer suspend refontification until the new tokens have arrived, in which case fontification would be performed twice (once with regular font locking + old tokens, then with regular font locking + new tokens).

I suppose it is true. On the other hand, you only need to clear existing highlighting when you receive the response from the server. So this approach can work without blinking, or similar visual aggravations.

yes, I'm just a bit worried about the runtime cost though I haven't done any fontification benchmarks. Should do that, then we can decide whether fontifying twice incurs a relevant performance penalty or not

Also, I wonder how different nonstandard fontification mechanisms would work together if we were to use font-lock-face.

Multiple non-standard ones? I don't know (though there are some things you could try). But your current approach doesn't offer any solutions for that either, does it?

currently, I'm calling font-lock-fontify-region-function followed by lsp-provided semantic highlighting (or at least that is the plan, I haven't tested this together with a nondefault value of font-lock-fontify-region-function). So I'd expect that if there was another mechanism that defined font-lock-face properties, or if there were one or several other mechanisms that similarly patched up font-lock-fontify-region-function, face properties would still be layered correctly?

OTOH, the suggested approach is very compatible with whitespace-mode, for instance.

does the current implementation of semantic highlighting interfere with whitespace-mode in some way? If so, I didn't know; that would of course be a severe shortcoming

@dgutov
Copy link
Contributor

dgutov commented Jun 2, 2020

worried about the runtime cost though I haven't done any fontification benchmarks

I suppose that could be some concern. Though you probably mean double redisplay. Off the top of my head, I'm not sure which option is going to be faster, but suspect that mine will.

So I'd expect that if there was another mechanism that defined font-lock-face properties, or if there were one or several other mechanisms that similarly patched up font-lock-fontify-region-function, face properties would still be layered correctly?

The former -- probably yes. The latter - I don't know. It's hard to predict the effect of such layering of fontification functions.

does the current implementation of semantic highlighting interfere with whitespace-mode in some way? If so, I didn't know; that would of course be a severe shortcoming

I don't know. whitespace-mode was just a benchmark of the sanity of my approach.

@sebastiansturm
Copy link
Contributor Author

worried about the runtime cost though I haven't done any fontification benchmarks

I suppose that could be some concern. Though you probably mean double redisplay. Off the top of my head, I'm not sure which option is going to be faster, but suspect that mine will.

interesting. Not sure why it would be faster, but if it is, I'll gladly make the change. Will set up some benchmarks on the weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants