-
Notifications
You must be signed in to change notification settings - Fork 856
Support that client and server agree on the word boundaries of a language #937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
And what does a "word boundary configuration" look like? Simply a regex like VSCode's |
Needs to be defined :-) |
but regular expression are usually a good candidate for this. |
Seems like this might be related to #554 if you're looking to get word boundary/pattern from the client. |
This precludes a smart server from choosing a boundary based on a deeper understanding of the language. Or conversely forces the client to do a code analysis as complex as the server does. This is certainly not going to work with my TeX server. A better approach would be a method for the client to ask the server what the "word at point" is. Or, even simpler, the server can specify in its response to a completion request (or any other relevant request) what it is regarding as the "word at point". |
+1 |
@astoff as per #898 (comment)
|
I hope that in light of #898 (comment) it is agreed that the Is this the case? |
@astoff IIUC #898 (comment) does not contradict to:
Here it is the actual completion item: {
"textEdit": {
"newText": "[\"so so\"]",
"range": {
"end": {
"character": 10,
"line": 6
},
"start": {
"character": 9,
"line": 6
}
}
},
"filterText": ".so so",
"insertTextFormat": 2,
"data": {
"entryNames": [
"so so"
],
"offset": 11,
"line": 7,
"file": "/home/kyoncho/Sources/my-first-app/src/main.ts"
},
"commitCharacters": [
".",
",",
"("
],
"sortText": "0",
"kind": 2,
"label": "so so"
} Note that if in this case, the client uses the word at a point which is |
@yyoncho Sure, if you know how the editor works, you can always cook up a response that will make it react in the desired way. But I think the point here is to design a good protocol. |
I'll just add my vote to @astoff . I pretty much agree everything he says. In particular
The edit range meaning is, "the text that is going to be replaced". While this is usually the same as the text also used for filtering, this is just accidental. For example #898 (comment) . The fact that the completion wants to 'eat' the extra '.' in front of the completed word does not mean that the '.' should be considered part of the 'word'. It is more like the '.' just happens to be 'in the way' for the completion to be inserted properly without creating a syntax error. We have similar examples in our yaml language servers. There we want to sometimes delete some white space in front of yaml completions to insert completions at the right level of indentation. The fact that a completion wants to 'eat' some extraneous whitespace in front of its 'filter word', does not mean we want this whitespace to be considered as part of the 'filter word'. |
@astoff @kdvolder I am not saying that the I have also filed several bugs like this: eclipse-jdtls/eclipse.jdt.ls#1348 |
Just noticed this issue here. @dbaeumer the issue is very well formulated. The regexp solution is not bad, but it might be overkill here. One needs to ask when this agreement has to be had. Editors already have some "offline" understand of words, even the one I'm typing this in right now. Those are fine. But it's only when completing (and perhaps some cases like the "rename thing at cursor" scenario) that you need the more sophisticated agreement. Plus, in general, not every language can be split up into "words", many have words inside other expressions inside other expression. And certainly I would doubt that these things can be parsed successfully with regular expressions. Remember the "regexps: now you have two problems" rule-of-thumb :-) and consider that you might be giving every server author those "two problems" I think the |
First. Even if they do, it doesn't mean its the right solution. Indeed there are examples where this method of 'guessing' the prefix ends up getting the wrong prefix and results in incorrect filtering and sorting. I have experienced this myself and there really isn't any workaround for it. As a server author you just have to accept that some completions will not be displayed at all or not displayed in a good order and you cannot change this in any way. Secondly, I may be wrong but lps4e does something much more complex as far as I recall (though they may have changed it since I explored it). They have some fairly complex heuristic that tries to guess the 'prefix' by working backwards from the cursor position and trying to match as much of the document text to a completion as possible. Anyhow... the conclusion that I came to (and it is an opinion for sure not a solid fact) is that this kind of 'guessing' game, should not be necessary. And the simplest way to avoid this guessing game,is simply have the server tell the client what range of the document it considered as the 'input / filter / prefix' text. The server must know what range it considered as the input / filter (for it is part of how it computes the completions in the first place). Just passing this info to client will avoid all this complexity of guessing word boundaries and guessing what part of the document constitutes the 'input / filter' text on the client side. |
@kdvolder note that the text synchronization is an optional feature so the server might not have the document content at all... https://git.eclipse.org/c/lsp4e/lsp4e.git/tree/org.eclipse.lsp4e/src/org/eclipse/lsp4e/operations/completion/LSIncompleteCompletionProposal.java#n375 - here it is the lsp4e code - there is no guessing when the textEdit. Using the textEdit to specify the range is sufficient to handle all cases (I think this was established in the other thread). The guessing happens when there is no textEdit and this issue is about it.
With all the respect this doesn't seems to be true - or at least nobody has mentioned a scenario that cannot be handled with adjusting filterText according to the textEdit range. |
Very well, maybe you are right. It's just not practical to try and reverse-engineer the logic of all the clients and 'hack' it so it does what you wanted. So avoid that mess and just let the server tell the client what the 'input range' is. The server already has to know it anyways, and all the mess and complexity created on the client by the uncertainty of guessing it is avoided. |
Yes, and it's painful to have to note, again, that this is exactly what this issue is about. We're just discussing what is the best way to do it is. On the other hand, if others want to write crazy clients that iterate all the edits, we should let them. |
Okay, well then. Let me be clear. None of the three options proposed by @dbaeumer appeal to me. In fact I think that assuming there is such a thing as 'word boundaries' is already a mistake. It implies that a document is nothing more than a long, flat string of words, which is really not the case for programming languages. The source code is typically much more structured and requires context (words inside of JavaDoc is different from words inside of java code, or inside of strings inside the java code etc.). So what I'm saying is, I vote against all 3 of these and instead there is an option 4: just include the 'input range' (or whatever you want to call that range) as part of the response to a completion request. Basically this is like option 2 (i.e. the server decides/computes the 'boundary'). Except, you don't add some new 'wordBoundary' request to the protocol, but just add some extra info to the completion response. I don't beleave in option 1 and 2 for the same reasons as @dbaeumer stated. I don't beleave in option 3 because this in all likely hood will end up being some complex regexp approach. And I don't beleave regexps are powerful enough to deal with the typical context sensitive nature of word boundaries in programming language text. |
Yes, I agree with everything you say, what I'm saying is that at least this issue is a step in the right direction. Actually, there is one advantage to @dbaeumer 's option 3 (despite all the other disadvantages). The client can know the thing at point without ever contacting the server, so it's very fast. |
Right, but if we are thinking specifically in the context of completion handling. You can't really avoid talking to the server (to ask for completions). So the performance arguments againts option 2 is not really applicable to our option 4 (include 'input range' in completion response). |
That's very true, but in the case of Emacs, it happens to be slightly advantageous because Emacs will may contact many backends, not just LSP. And even if just LSP it can e.g. forego the lenghty I agree it's not a tremendous advantage, though. |
Hmm... I like to invoke completion request even before typing any part of my identifier. In language like Java with strong static typing, just knowing the context is often already enough to generate a good and focussed set of completions. So I would question whether assuming that when the 'contact region' is 0 chars it is never meaningful to show/compute completions is really a sane/good assumption. |
Yes, you're right, I wasn't thinking well. But Emacs does have internal documentation requesting each completion backend to report the boundaries as fast as possible (it's off-topic here, I can point to it offline, just mail me). |
There are other possible applications of having a "local" way to calculate symbol boundaries (for example, we have optional sorting that searches for the nearby occurrences of the symbol in the current file, it wouldn't be too great to issue a network call for each occurrence). So 3ii sounds the best to me, as long as that regexp can be converted to an Emacs regexp (we don't have lookahead or lookbehind, as one possible difficulty). |
@dgutov notice that @dbaeumer's proposal is that the client come up with this regular expression and tell it to the server. So the regexp-parsing limitations don't apply here. It could be the other way around though (in fact at first sight I thought that was the idea). Anyway, there has to be agreement. @dbaeumer maybe you are aware of it by now, but I'll state it anyway. Somehow achieving this agreement would benefit not only |
I see. Well, yes, the other way around seems like can be more powerful. As long as we standardize on a subset of the regexp syntax that everyone can support.
Indeed. |
Filtering of completion items in the simple insert text case is based on word boundaries of the language (see #898 (comment)).
Since completion items are generated by the server they both must have the same understanding of what a word boundary is. I see different possibilities to address this:
I am actually in favor of the 3.ii.
The text was updated successfully, but these errors were encountered: